Example usage:
datasette package fivethirtyeight.db \
--tag fivethirtyeight \
--metadata=538-metadata.json
This will create a temporary directory, generate a Dockerfile, copy in the
SQLite database and metadata file, then build that as a new docker image and
tag that in your local Docker repository as fivethirtyeight:latest.
You can then run the image like so:
docker run -p 8006:8001 fivethirtyeight
This will expose port 8001 in the container (the default) as port 8006 on your
host.
Closes#67
If provided, the --metadata option is the path to a JSON file containing
metadata that should be displayed alongside the dataset.
datasette /tmp/fivethirtyeight.db --metadata /tmp/metadata.json
Currently that metadata format looks like this:
{
"title": "Five Thirty Eight",
"license": "CC Attribution 4.0 License",
"license_url": "http://creativecommons.org/licenses/by/4.0/",
"source": "fivethirtyeight/data on GitHub",
"source_url": "https://github.com/fivethirtyeight/data"
}
If provided, this will be used by the index template and to populate the
common footer.
The publish command also accepts this argument, and will package any provided
metadata up and include it with the resulting Docker container.
datasette publish --metadata /tmp/metadata.json /tmp/fivethirtyeight.db
Closes#68
Building metadata is now optional. If you want to do it, do this:
datasette build *.db --metadata=metadata.json
Then when you run the server you can tell it to read from metadata:
datasette serve *.db --metadata=metadata.json
The Dockerfile generated by datasette publish now uses this mechanism.
Closes#60
You can now call arbitrary SQL like this:
/flights?sql=select%20*%20from%20airports%20where%20country%20like%20:c&c=iceland
Unescaped, those querystring params look like this:
sql = select * from airports where country like :c
c = iceland
So SQL can be constructed with named parameters embedded in it, which will
then be read from the querystring and correctly escaped.
This means we can aggressively filter the SQL parameter for potentially
dangerous syntax. For the moment we enforce that it starts with a SELECT
statement and we ban the sequence "pragma" from it entirely.
If you need to use pragma in a query, you can use the new named parameter
mechanism.
Fixes#39
SQLite operations are blocking, but we're running everything in Sanic, an
asyncio web framework, so blocking operations are bad - a long-running DB
operation could hold up the entire server.
Instead, I've moved all SQLite operations into threads. These are managed by a
concurrent.futures ThreadPoolExecutor. This means I can run up to X queries in
parallel, and I can continue to queue up additional incoming HTTP traffic
while the threadpool is busy.
Each thread is responsible for managing its own SQLite connections - one per
database. These are cached in a threadlocal.
Since we are working with immutable, read-only SQLite databases it should be
safe to share SQLite objects across threads. On this assumption I'm using the
check_same_thread=False option. Opening a database connection looks like this:
conn = sqlite3.connect(
'file:filename.db?immutable=1',
uri=True,
check_same_thread=False,
)
The following articles were helpful in figuring this out:
* https://pymotw.com/3/asyncio/executors.html
* https://marlinux.wordpress.com/2017/05/19/python-3-6-asyncio-sqlalchemy/Closes#45. Refs #38.
I now call a factory function to construct the Sanic app:
app = app_factory(files)
This allows me to pass additional arguments to it, e.g. the files to serve.
Also refactored my class-based views to accept jinja as an argument, e.g:
app.add_route(
TableView.as_view(jinja),
'/<db_name:[^/]+>/<table:[^/]+?><as_json:(.jsono?)?$>'
)