CSV Databases in Perl
On Monday I wrote about using Amazon Athena from Perl. That’s only step one though, because often I find myself needing to dig further.
Athena provides results as CSV. I have a couple tools I use to interact with
CSV. The older, which I’ve mentioned
before,
is a filter that reads CSV on STDIN and writes JSON on STDOUT. I use it with
normal shell tools and jq
:
#!/usr/bin/env perl
use strict;
use Text::xSV;
use JSON::XS;
my $CSV = Text::xSV->new();
my $JSON = JSON::XS->new->canonical->utf8;
$CSV->read_header();
while (my $row = $CSV->fetchrow_hash) {
print $JSON->encode($row) . "\n";
}
Then I might do something like:
cat foo.csv | csv2json | jq .username
The above is handy for little scripts, but what I’ve been enjoying lately is
actual SQL, so I have a script to use CSV as a database. I think I’m calling it
csvsh
. It wraps dbish
and leverages
DBD::CSV and does some annoying backbends
to ease the use of the tool.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Term::ANSIColor;
my $csv = shift or die "usage: $0 <path-to-csv>\n";
my $dir = $csv =~ s(^(.*/)([^/]+))($1/.$2)r;
unless (-d $dir) {
mkdir $dir;
link $csv, "$dir/_";
}
chdir $dir;
open my $fh, '<', '_';
my $header = <$fh>;
close $fh;
print colored(['bold'], "table is _, columns are $header");
system 'dbish', 'dbi:CSV:f_dir=.';
END {
chdir q(..);
unlink "$dir/_";
rmdir $dir;
};
I want to make it support a --sql
argument at some point but haven’t gotten
around to it yet. Here’s an example session of csvsh
:
$ csvsh ~/Downloads/3d5f3e2c-f9eb-48bc-9a3f-628a739e011d.csv
table is _, columns are "eventname","eventsource","eventtype","sourceipaddress","cnt"
Useless localization of scalar assignment at /home/frew/.plenv/versions/5.26.0/lib/perl5/site_perl/
5.26.0/DBI/Format.pm line 377.
DBI::Shell 11.95 using DBI 1.636
WARNING: The DBI::Shell interface and functionality are
======= very likely to change in subsequent versions!
Connecting to 'dbi:CSV:f_dir=.' as ''...
@dbi:CSV:f_dir=.> SELECT SUM(cnt), eventname FROM _ GROUP BY eventname;
SUM,eventname
68,'DescribeInternetGateways'
4,'GetBucketVersioning'
68,'DescribeApplications'
68,'DescribeDBSecurityGroups'
253,'DescribeAutoScalingGroups'
68,'DescribeSpotInstanceRequests'
460,'DescribeAccountAttributes'
3,'DescribeRegions'
260,'GetSendQuota'
68,'DescribeCacheSecurityGroups'
68,'DescribeEnvironments'
68,'DescribeNetworkInterfaces'
264,'GetAccountSummary'
136,'DescribeAddresses'
68,'DescribeVpcs'
68,'DescribeSubnets'
[ ... ]
It’s a little noisy at startup, but it still works pretty well.
As with Monday’s post I don’t have much I can link to that adds to this information, so I’ll duplicate the stuff I’m interested in this week:
(The following includes affiliate links.)
I just started the eigth book in The Malazan Series and as usual I’m enjoying it a lot.
In a totally different vein I just ordered a thermocouple in the hopes that it would give me more insight while roasting my coffee and allow me to make my roasts that much better.
Posted Wed, Jun 14, 2017If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.