TL;DR What is JSONSchema?
JSONSchema is just grammar for JSON that is also stored as JSON. It has implementations in basically every programming language so it is quite portable.
Next Steps
- Come up with a more specific user case, and finish this tutorial
Source Code
TODO
Question Engine and JSONSchema
The grand vision of Dentropy Daemon is to make all data a person has ever generated accessible via a single API, then find interesting things to do with it. For this vision to become a reality will require a variety of data formats that will have to be convertible amongst one another. For example no messaging app stores messages the same way as another even though they all basically have the same content. Now imagine all these message formats being able to transform amongst one another. Just like there are many ways to skin a chicken there are many ways to parse the same raw data. Just like how a skinned chicken is a skinned chicken once data is available in a supported JSONSchema format it can fit into any supported ddaemon application.
Help I don't know how to Code
- Check out my Learn to Code document and come back here.
Goals of This Tutorial
- Get some real world JSON from the web
- Use NodeJS to Problematically to infer JSONSchema from JSON
- Add JSON data that is valid with JSONSchema
- Add JSON data that is NOT valid with JSONSchema
- Read JSONSchema and write compatible JSON
- Edit raw JSONSchema and write compatible JSON
- Use JSONSchema with Python
- Write your own JSONSchema from scratch
Results of This Tutorial
- You will know how to use JSONSchema in your NodeJS and python projects
Requirements
Setup
git clone
cd JSONSchema-tutorial
Steps
Download JSON From web
cd JSONSchema-tutorial
mkdir JSON-data
cd JSON-data
curl -o pokedex.json https://raw.githubusercontent.com/fanzeyi/pokemon.json/master/pokedex.json
curl -o ev-data.json https://data.wa.gov/api/views/f6w7-q2d2/rows.json?accessType=DOWNLOAD
curl "https://en.wikipedia.org/w/api.php?origin=*&action=query&format=json&formatversion=2&redirects&prop=revisions&rvprop=content&titles=Albert+Einstein" | jq > wikipedia-Albert-Einstein.json
Install Requirements
git clone ......
npm init -y
npm instll jsonschema
npm install -g ajv # JSONSchema Validator
npm install -g ajv-cli # JSONSchema Validator CLI
npm install -g quicktype #JSONSchema Generator
pip install check-jsonschema
Playing with ev-data.json
Infer the JSONSchema
Get JSONSchema from ./JSON-data/ev-data.json
quicktype -l schema -o ev-data-schema.json ./JSON-data/ev-data.json
This JSONSchema produced too much gibberish, before we look into why let't test the schema that was generated.
ajv -s ev-data-schema.json -d ./JSON-data/ev-data.json
And we get.....
(base) ➜ JSONSchema-tutorial ajv -s ev-data-schema.json -d ./JSON-data/ev-data.json
schema ev-data-schema.json is invalid
error: strict mode: unknown keyword: "qt-uri-protocols"
(base) ➜ JSONSchema-tutorial
Wow okay that failed let's examine that later, let's try another JSONSchema validator
(base) ➜ JSONSchema-tutorial check-jsonschema --schemafile ./ev-data-schema.json ./JSON-data/ev-data.json
ok -- validation done
Alright that took a long time but it did complete successfully.
Let's try another one,
// test.js
const fs = require('fs');
var Validator = require('jsonschema').Validator;
var v = new Validator();
let schema = JSON.parse(fs.readFileSync('./ev-data-schema.json'));
let instance = JSON.parse(fs.readFileSync('./JSON-data/ev-data.json'));
var res = v.validate(instance, schema);
console.log(res.valid) // true
Result:
(base) ➜ JSONSchema-tutorial node test.js
true
Nice, the JSONSchema validators did not all work now let's get in there and understand why.
There is so much complexity inside the JSONSchema because there is no regular pattern, for example the data data under keys .meta
and .data
are completely unique, let's try pulling out those pieces of data specifically.
cat ./JSON-data/ev-data.json | jq .meta > ./JSON-data/ev-data-meta.json
cat ./JSON-data/ev-data.json | jq .data > ./JSON-data/ev-data-data.json
Now let's generate the schemas from the subset of JSON.
quicktype -l schema -o ev-data-data-schema.json ./JSON-data/ev-data-data.json
quicktype -l schema -o ev-data-meta-schema.json ./JSON-data/ev-data-meta.json
When we take a look inside ev-data-data-schema.json
we see a nice concise type description. When we take a look at ev-data-meta-schema.json
we see a whole lot of gibberish, let's check out `
./JSON-data/ev-data-meta.json` to understand why.
When you take a look inside ./JSON-data/ev-data-meta.json
you will not see any regular patterns, except under the .view.columns
key. The JSONSchema has to check for every unique key in the JSON which is why the JSONSchema,./ev-data-meta-schema.json
, is so complex.
Let's now try and generate a JSONSchema for .columns
cat ./JSON-data/ev-data-meta.json | jq .view.columns > ./JSON-data/ev-data-meta-columns.json
quicktype -l schema -o ev-data-meta-columns-schema.json ./JSON-data/ev-data-meta-columns.json
Now we can take a look in ev-data-meta-columns-schema.json
and see we have a relatively concise JSONSchema.
Now since only .data
and .meta.views.columns
contain regular patterns of information how can we create a JSONSchema that only checks for those JSON paths.
.data
and .meta.views.columns
are the regular data structures we want to validate. It is possible to write a JSONSchema that can validate the entire ev-data.json
file but it will just be easier to jq
our way to victor, take a look.
Run Commands:
cat ./JSON-data/ev-data.json | jq .data | ajv -s ./ev-data-data-schema.json
# Failed, dammit I can't pipe into ajv
ajv -s ./ev-data-data-schema.json -d ./JSON-data/ev-data-data.json
Result:
(base) ➜ JSONSchema-tutorial ajv -s ./ev-data-data-schema.json -d ./JSON-data/ev-data-data.json
./JSON-data/ev-data-data.json valid
Nice now let's try check-jsonschema
Run Commands:
cat ./JSON-data/ev-data.json | jq .data | ajv -s ./ev-data-data-schema.json
# Failed, dammit I can't pipe into ajv
ajv -s ./ev-data-data-schema.json -d ./JSON-data/ev-data-data.json
cat ./JSON-data/ev-data.json | jq .meta.views.columns | check-jsonschema --schemafile ./JSON-data/ev-data-data.json
# Failed, dammit I can't pipe into ajv
check-jsonschema --schemafile ./ev-data-data-schema.json ./JSON-data/ev-data-data.json
Result:
(base) ➜ JSONSchema-tutorial check-jsonschema --schemafile ./ev-data-data-schema.json ./JSON-data/ev-data-data.json
ok -- validation done
check-jsonschema
took a long time but it was still successful. Now let's also try the jsonschema
npm package.
Code:
// test2.js
const fs = require('fs');
var Validator = require('jsonschema').Validator;
var v = new Validator();
let schema = JSON.parse(fs.readFileSync('./ev-data-data-schema.json'));
let instance = JSON.parse(fs.readFileSync('./JSON-data/ev-data.json'));
var res = v.validate(instance.data, schema);
console.log(res.valid) // true
Run Commands:
node test2.js
Result:
(base) ➜ JSONSchema-tutorial# node test2.js
true
Playing with pokedex.json
quicktype -l schema -o pokedex-schema.json ./JSON-data/pokedex.json
You should now have pokedex-schema.json
, it is only 119 lines long and get's strait to the point. Now let's validate it.
Run Command:
ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex.json
Result:
(base) ➜ JSONSchema-tutorial ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex.json
./JSON-data/pokedex.json valid
Nice that worked, now let's try the other validator,
Run Command:
check-jsonschema --schemafile ./pokedex-schema.json ./JSON-data/pokedex.json
Result:
(base) ➜ JSONSchema-tutorial check-jsonschema --schemafile ./pokedex-schema.json ./JSON-data/pokedex.json
ok -- validation done
Nice that was easy, now let's try the jsonschema npm package
Code:
// jsonschema.js
const fs = require('fs');
var Validator = require('jsonschema').Validator;
var v = new Validator();
let schema = JSON.parse(fs.readFileSync(process.argv[2]));
let instance = JSON.parse(fs.readFileSync(process.argv[3]));
var res = v.validate(instance, schema);
console.log(res.valid) // true
Run Commands:
node jsonschema.js ./pokedex-schema.json ./JSON-data/pokedex.json
Result:
(base) ➜ JSONSchema-tutorial node jsonschema.js ./pokedex-schema.json ./JSON-data/pokedex.json
true
Nice that worked.
Now let's try and invent out own Pokemon and test if they are compatible with the jsonschema.
Valid Pokemon Test
export new_valid_pokemon="$(cat ./JSON-data/valid-new-pokemon.json)"
echo $new_valid_pokemon
jq ". += [$new_valid_pokemon]" ./JSON-data/pokedex.json > ./JSON-data/pokedex-valid.json
ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex-valid.json
Invalid Pokemon Test:
export new_invalid_pokemon="$(cat ./JSON-data/invalid-new-pokemon.json)"
echo $new_invalid_pokemon
jq ". += [$new_invalid_pokemon]" ./JSON-data/pokedex.json > ./JSON-data/pokedex-invalid.json
ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex-invalid.json
Links
Logs
- What is an example json dataset