dbfetch is a utility that will fetch a JSON object (or list of JSON objects) from a web source and flatten it to store relevant fields into a traditional SQL-based relational database. Fields and transformations for flattening JSON blobs are provided in YAML files, the format of which is outlined below.
The notes on this page mostly apply to the classic branch. The master branch went in a slightly new direction that hasn't been tested as thoroughly and may not be as compatible as we'd like.
Models are defined in YAML files under the models subdirectory. dbfetch will try to create a table and add columns to match the model. If a column is added to the model for an existing table, dbfetch will attempt to add those columns to the existing table.
All models must have at least one field with the primary_key property.
With some models present, dbfetch can then be configured to fetch JSON objects from various sources and process them using those models. The configuration file is dbfetch.ini by default, but this can be changed using the -c command-line argument.
The configuration file is split into stanzas, with one stanza per model to define options for that model and multiple location stanzas per model to define different sources to pull JSON objects from for processing and storage with that model.
This example will attempt to process the following JSON object, fetched from http://example.com/aqi.json:
1 | { |
2 | "pm25_standard": 24, |
3 | "relative_humidity": 39.67132568359375, |
4 | } |
This example will also use a simplified version of the model i2cerv.yml, with irrelevant parts (e.g. for plotting) stripped out.
1 | --- |
2 | tablename: i2cerv |
3 | options: {} |
4 | transformations: |
5 | updated: |
6 | - Requester.format_date |
7 | fields: |
8 | location: |
9 | type: int |
10 | primary_key: true |
11 | updated: |
12 | type: datetime |
13 | primary_key: true |
14 | pm25_standard: |
15 | type: float |
16 | relative_humidity: |
17 | type: float |
Finally, the following stanza in the configuration file, dbfetch.ini will activate the model above when it is placed in the models directory and use it to process the JSON object fetched from the example URL into an SQLite database (any valid SQLAlchemy connection string should work):
1 | [i2cerv] |
2 | connection = sqlite:///i2cerv.db |
3 | locations = 1 |
4 | |
5 | [dbfetch-location-1] |
6 | url = http://example.com/aqi.json |
This section is under construction!
Please stay tuned for updates pending research and development.
Please see models/waqi.yml for an example of a model that pulls the children from the object.data.x fields up to multiple object.data_x fields.
This section is under construction!
Please stay tuned for updates pending research and development.
This section is under construction!
Please stay tuned for updates pending research and development.