CSV Databases in Perl

On Monday I wrote about using Amazon Athena from Perl. That’s only step one though, because often I find myself needing to dig further.

Athena provides results as CSV. I have a couple tools I use to interact with CSV. The older, which I’ve mentioned before, is a filter that reads CSV on STDIN and writes JSON on STDOUT. I use it with normal shell tools and jq:

#!/usr/bin/env perl

use strict;
use Text::xSV;
use JSON::XS;

my $CSV = Text::xSV->new();
my $JSON = JSON::XS->new->canonical->utf8;
$CSV->read_header();
while (my $row = $CSV->fetchrow_hash) {
    print $JSON->encode($row) . "\n";
}

Then I might do something like:

cat foo.csv | csv2json | jq .username

The above is handy for little scripts, but what I’ve been enjoying lately is actual SQL, so I have a script to use CSV as a database. I think I’m calling it csvsh. It wraps dbish and leverages DBD::CSV and does some annoying backbends to ease the use of the tool.

#!/usr/bin/env perl

use strict;
use warnings;

use autodie;

use Term::ANSIColor;

my $csv = shift or die "usage: $0 <path-to-csv>\n";

my $dir = $csv =~ s(^(.*/)([^/]+))($1/.$2)r;

unless (-d $dir) {
  mkdir $dir;
  link $csv, "$dir/_";
}

chdir $dir;

open my $fh, '<', '_';
my $header = <$fh>;
close $fh;

print colored(['bold'], "table is _, columns are $header");

system 'dbish', 'dbi:CSV:f_dir=.';

END {
  chdir q(..);
  unlink "$dir/_";
  rmdir $dir;
};

I want to make it support a --sql argument at some point but haven’t gotten around to it yet. Here’s an example session of csvsh:

$ csvsh ~/Downloads/3d5f3e2c-f9eb-48bc-9a3f-628a739e011d.csv
table is _, columns are "eventname","eventsource","eventtype","sourceipaddress","cnt"
Useless localization of scalar assignment at /home/frew/.plenv/versions/5.26.0/lib/perl5/site_perl/
5.26.0/DBI/Format.pm line 377.
DBI::Shell 11.95 using DBI 1.636

WARNING: The DBI::Shell interface and functionality are
=======  very likely to change in subsequent versions!


Connecting to 'dbi:CSV:f_dir=.' as ''...
@dbi:CSV:f_dir=.> SELECT SUM(cnt), eventname FROM _ GROUP BY eventname;
SUM,eventname
68,'DescribeInternetGateways'
4,'GetBucketVersioning'
68,'DescribeApplications'
68,'DescribeDBSecurityGroups'
253,'DescribeAutoScalingGroups'
68,'DescribeSpotInstanceRequests'
460,'DescribeAccountAttributes'
3,'DescribeRegions'
260,'GetSendQuota'
68,'DescribeCacheSecurityGroups'
68,'DescribeEnvironments'
68,'DescribeNetworkInterfaces'
264,'GetAccountSummary'
136,'DescribeAddresses'
68,'DescribeVpcs'
68,'DescribeSubnets'
[ ... ]

It’s a little noisy at startup, but it still works pretty well.

As with Monday’s post I don’t have much I can link to that adds to this information, so I’ll duplicate the stuff I’m interested in this week:

(The following includes affiliate links.)

I just started the eigth book in The Malazan Series and as usual I’m enjoying it a lot.

In a totally different vein I just ordered a thermocouple in the hopes that it would give me more insight while roasting my coffee and allow me to make my roasts that much better.

Posted Wed, Jun 14, 2017

If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.