CSV Databases in Perl

On Monday I wrote about using Amazon Athena from Perl. That’s only step one though, because often I find myself needing to dig further.

Athena provides results as CSV. I have a couple tools I use to interact with CSV. The older, which I’ve mentioned before, is a filter that reads CSV on STDIN and writes JSON on STDOUT. I use it with normal shell tools and jq:

#!/usr/bin/env perl

use strict;
use Text::xSV;
use JSON::XS;

my $CSV = Text::xSV->new();
my $JSON = JSON::XS->new->canonical->utf8;
$CSV->read_header();
while (my $row = $CSV->fetchrow_hash) {
    print $JSON->encode($row) . "\n";
}

Then I might do something like:

cat foo.csv | csv2json | jq .username

The above is handy for little scripts, but what I’ve been enjoying lately is actual SQL, so I have a script to use CSV as a database. I think I’m calling it csvsh. It wraps dbish and leverages DBD::CSV and does some annoying backbends to ease the use of the tool.

#!/usr/bin/env perl

use strict;
use warnings;

use autodie;

use Term::ANSIColor;

my $csv = shift or die "usage: $0 <path-to-csv>\n";

my $dir = $csv =~ s(^(.*/)([^/]+))($1/.$2)r;

unless (-d $dir) {
  mkdir $dir;
  link $csv, "$dir/_";
}

chdir $dir;

open my $fh, '<', '_';
my $header = <$fh>;
close $fh;

print colored(['bold'], "table is _, columns are $header");

system 'dbish', 'dbi:CSV:f_dir=.';

END {
  chdir q(..);
  unlink "$dir/_";
  rmdir $dir;
};

I want to make it support a --sql argument at some point but haven’t gotten around to it yet. Here’s an example session of csvsh:

$ csvsh ~/Downloads/3d5f3e2c-f9eb-48bc-9a3f-628a739e011d.csv
table is _, columns are "eventname","eventsource","eventtype","sourceipaddress","cnt"
Useless localization of scalar assignment at /home/frew/.plenv/versions/5.26.0/lib/perl5/site_perl/
5.26.0/DBI/Format.pm line 377.
DBI::Shell 11.95 using DBI 1.636

WARNING: The DBI::Shell interface and functionality are
=======  very likely to change in subsequent versions!


Connecting to 'dbi:CSV:f_dir=.' as ''...
@dbi:CSV:f_dir=.> SELECT SUM(cnt), eventname FROM _ GROUP BY eventname;
SUM,eventname
68,'DescribeInternetGateways'
4,'GetBucketVersioning'
68,'DescribeApplications'
68,'DescribeDBSecurityGroups'
253,'DescribeAutoScalingGroups'
68,'DescribeSpotInstanceRequests'
460,'DescribeAccountAttributes'
3,'DescribeRegions'
260,'GetSendQuota'
68,'DescribeCacheSecurityGroups'
68,'DescribeEnvironments'
68,'DescribeNetworkInterfaces'
264,'GetAccountSummary'
136,'DescribeAddresses'
68,'DescribeVpcs'
68,'DescribeSubnets'
[ ... ]

It’s a little noisy at startup, but it still works pretty well.


As with Monday’s post I don’t have much I can link to that adds to this information, so I’ll duplicate the stuff I’m interested in this week:

I just started the eigth book in The Malazan Series and as usual I’m enjoying it a lot.

In a totally different vein I just ordered a thermocouple in the hopes that it would give me more insight while roasting my coffee and allow me to make my roasts that much better.

Posted Wed, Jun 14, 2017