JSON on the Command Line
Recently my coworker Andy Ruder was complaining that he often reached for grep when filtering JSON, and I offered to give him some tips. This post is an expansion of what I told him.
I deal with JSON multiple times a day. Our logs are JSON, so easily being able to read and interact with them is important. I use a small number of tools and techniques and in general think that my life with JSON on par with, say, traditional Unix files that are tables (at best) separated by a single character.
🔗 jq
The primary tool for JSON interaction is the popular jq
. Generally
speaking the usage of jq
is unsurprising and approachable. Here is a
typical use of jq
:
$ echo '{"user": { "groups":[{"id":1,"name":"frew"},{"id":2,"name":"admin"}]}}' |
jq '.user.groups[].name'
"frew"
"admin"
jq
has a couple flags that you really want to know about. First is -S
which
sorts the keys of any objects that it prints out. At some point I am likely to
make a jq
wrapper to just always turn this on. Second is -r
which disables
the quoting and (I think) color coding of the output. We’ll use this in a later
example.
Another feature jq
offers, which is really strange at first, is the ability to
filter with simple JavaScript like expressions. Here’s how that works:
$ echo '{"name":"frew","value":"engineer"}{"name":"frooh","value":"pal"}' |
jq 'select(.name == "frew") | .value'
"engineer"
Note that jq
allows leaving the |
off between it’s internal pipelines almost
always, but it helps my understanding to include it.
jq
can understand any documents that are concatenated together, thanks to the
fact that JSON is self terminating. So the above works, newline terminated
works, etc.
🔗 gron
While jq
gives you a nice little DSL for interacting with JSON, gron
makes JSON fit in better with typical Unix tools.
Here’s how you use it:
$ echo '{"user": { "groups":[{"id":1,"name":"frew"},{"id":2,"name":"admin"}]}}' |
gron
json = {};
json.user = {};
json.user.groups = [];
json.user.groups[0] = {};
json.user.groups[0].id = 1;
json.user.groups[0].name = "frew";
json.user.groups[1] = {};
json.user.groups[1].id = 2;
json.user.groups[1].name = "admin";
I use this probably twice a month by running something like this:
$ cat bigfile.json | gron | grep '[email protected]'
json.user[123].email = "[email protected]";
And then to get the rest of the record I use grep -F
:
$ cat bigfile.json | gron | grep -F 'json.user[123]'
json.user[123].id = 123;
json.user[123].email = "[email protected]";
json.user[123].name = "Frew Schmidt";
Then, if you are using this with a program, you can pipe to gron -u
(ungron
)
to get json back out. Honestly though, I find that mode better for filtering on
“columns:”
$ cat bigfile.json | gron | grep -P '\.(id|name) ' | gron -u
...
},
{
"id": 123,
"name": "Frew Schmidt"
}
]
Finally, if it’s not obvious, gron
is great for reverse engineering the path
of a deeply nested structure. Like I did with the second gron
example, but
not so trivial to eyeball.
🔗 csv2json
csv2json
(which I have mentioned twice now) is a very simple
Perl script, originally implemented by Andrew Farmer, that turns CSV into JSON.
Usage is trivial:
$ cat foo.csv | csv2json | jq .
It uses the header of the csv for column names. This means that it can be annoying in pipelines, requiring something like this:
$ ( head -1 foo.csv ; cat foo.csv | grep whatever ) | csv2json | jq .
I rarely use the above idiom, but it’s good for when you proces enough data to
actually have to wait (10s or more) for jq
to finish. I’ve found that grep
will outperform it by orders of magnitude.
Because of all of these tools, I am often willing to use JSON even if it’s less
efficient than something more natural. For example, when querying Athena I will
get CSV with a log_date
column and a record
column. The former is an
ISO8601 date and the latter is just JSON. Sure, I could probably use cut
to
extract the record, but the following works well enough and I suspect works
better in cases where the output is “strange:”
$ cat athena.csv | csv2json | jq .record -r | jq .
🔗 yaml2json
This is a tiny tool that I have in my dotfiles (and thus on all servers I connect to) which makes treating YAML like JSON trivial. I suspect the usage is obvious but here it is:
$ cat /etc/salt/grains | yaml2json | jq .
I avoid YAML when possible, but sometimes I have to interact with it, and this helps a lot.
I hope this is helpful! I think if anything, the tooling above should be an encouragement for those on the fence about JSON oriented logs. The only place where I am not a fan of JSON oriented logs is directly to the screen, which I am actually, actively working on solving at work and may blog about some other time.
(The following includes affiliate links.)
If you’d like to learn more about this kind of tool,
The Linux Command Line
would be a good start. Chapter 20 specifically covers this kind of tool, though
with more of the usual suspects like cut
, sort
, uniq
, etc.
If you want to improve your foundations, The Unix Programming Environment is an excellent read. It was one of the few tech books in recent memory that I read cover to cover.
Posted Mon, Sep 18, 2017If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.