SQLite offers through its JSON1 extension a lot of capability. Generating JSON outputs from SQL, even complex structured, multi-level JSON is very straightforward.
But what about the opposite use case ? When you have store data as JSON in an SQLite database, and you want to search or filter it ? This is a short exploration of what to expect.
Setting up the stage
For the purpose of this article, we’ll go for a very simple storage, essentially records with a single JSON
create table test (data text, x float)
As you no doubt noticed, I also added the x column as a reference for benchmarking, it is just here for benchmarking, and would not exist in a self-respecting no-SQL schema.
Next we fill that table with about 120 MB of random, using the following DWScript
var db := new DataBase('SQLite', [ 'C:\Temp\test.sql3' ]); db.StartTransaction; for var i := 1 to 1000000 do begin var x := Random; db.Exec('insert into test (data, x) values (?, ?)', [ JSON.Stringify(record vx := x; vx2 := x*x; vxs := x.ToString; blabla := CryptographicToken; end), x ]); end; db.Commit;
After a few seconds, the table is now full of small JSONs like this one
{ "blabla":"8Pf3-F5Q7tVM7TIOsMyP", "vx":8.30008579760242E-5, "vx2":6.88914242475613E-9, "vxs":"8.30008579760242E-5" }
Classic column and index performance
We can proceed to the next step and run a few queries against it, let’s begin by filtering on x with something like
select data from test where x < 0.0001
without any indexes, on a “classic” column, this means a full table scan, and it takes 169 ms on my core i5 to return about 100 columns.
After creating a simple index on column x
create index test_x_idx on test(x)
The select now runs in 0.15 ms. Let those figures be our baseline.
JSON column and field performance
Instead of querying against the traditional column, we can query against the JSON field
select data from test where json_extract(data, '$.vx') < 0.0001
This is a full table scan again, but it runs in 976 ms, ouch… game over ? not so fast my friend !
SQLite supports indexes on expressions, let’s create one
create index test_extract_vx_idx on test(json_extract(data, '$.vx'))
with that index in place our select with json_extract now runs in 0.15 ms, same as the classic column!
Further skullduggery
To speed up SQLite when used as a no-SQL database, there are two other very efficient tools in my experience:
- Partial indexes which are great both at limiting index size and improving their performance (smaller indexes need less I/O and memory), they have to be handled with caution however
- Full Text Search, because JSON is a sort of text, and having a FTS index on a JSON column can speed up searching not just value but also field names. Just be careful with the tokenization.
- Generated Columns, when everything else fails you, these can be a good fallback