Iterating over Chunks of a Diff in Vim

Every now and then at work I’ll make broad, sweeping changes in the codebase. The one I did recently was replacing all instances of print STDERR "foo\n" with warn "foo\n". There were about 160 instances in all that I changed. After discussing more with my boss, we discussed that instead of blindly replacing all those print statements with warns (which, for those who don’t know, are easier to intercept and log) we should just log to the right log level.

Enter Quickfix

Quickfix sounds like some kind of bad guy from a slasher movie to me, but it’s actualy a super handy feature in Vim. Here’s what the manual says:

Vim has a special mode to speedup the edit-compile-edit cycle. This is inspired by the quickfix option of the Manx’s Aztec C compiler on the Amiga. The idea is to save the error messages from the compiler in a file and use Vim to jump to the errors one by one. You can examine each problem and fix it, without having to remember all the error messages.

More concretely, the quickfix commands end up giving the user a list of locations. I tend to use the quickfix list most commonly with Fugitive. You can run the command :Ggrep foo and the quickfix list will contain all of the lines that git found containing foo. Then, to iterate over those locations you can use :cnext, :cprev, :cwindow, and many others, to interact with the list.

I have wanted a way to populate the quickfix list with the locations of all of the chunks that are in the current modified files for a long time, and this week I decided to finally do it.

First off, I wrote a little tool to parse diffs and output locations:

#!/usr/bin/env perl

use strict;
use warnings;

my $filename;
my $line;
my $offset = 0;
my $printed = 0;
while (<STDIN>) {
   if (m(^\+\+\+ b/(.*)$)) {
      $printed = 0;
      $filename = $1;
   } elsif (m(^@@ -\d+(?:,\d+)? \+(\d+))) {
      $line = $1;
      $offset = 0;
      $printed = 0;
   } elsif (m(^\+(.*)$)) {
      my $data = $1 || '-';
      print "$filename:" . ($offset + $line) . ":$data\n"
         unless $printed;
      $printed = 1;
   } elsif (m(^ )) {
      $printed = 0;

The general usage is something like: git diff | diff-hunk-list, and the output will be something like:

app/lib/ZR/Plack/Middleware/  local $SIG{USR1} = sub {
bin/zr-plack-reaper:29:sub timeout { 120 }

The end result is a location for each new set of lines in a given diff. That means that deleted lines will not be included with this tool. Another tool or more options for this tool would have to be made for that functionality.

Then, I added the following to my vimrc:

command Gdiffs cexpr system('git diff \| diff-hunk-list')

So now I can simply run :Gdiffs and iterate over all of my changes, possibly tweaking them along the way!

Super Secret Bonus Content

The Quickfix is great, but there are a couple other things that I think really round out the functionality.

First: the Quickfix is global per session, so if you do :Gdiffs and then :Ggrep to refer to some other code, you’ve blown away the original quickfix list. There’s another list called the location list, which is scoped to a window. Also very useful; tends to use commands that start with l instead of c.

Second: There is another Tim Pope plugin called unimpaired which adds a ton of useful mappings; which includes [q and ]q to go back and forth in the quickfix, and [l and ]l to go back and forth in the location list. Please realize that the plugin does way more than just those two things, but I do use it for those the most.

Posted Wed, May 25, 2016

OSCON 2016

ZipRecruiter, where I work, generously pays for each engineer to go at least one conference a year. I have gone to YAPC every year since 2009 and would not skip it, except my wife is pregnant with our second child and will be due much too close to this year’s YAPC (or should I say instead: The Perl Conference?) for me to go.

There were a lot of conferences that I wanted to check out; PyCon, Monitorama, etc etc, but OSCON was the only one that I could seem to make work out with my schedule. I can only really compare OSCON to YAPC and to a lesser extent SCALE and the one time I went to the ExtJS conference (before it was called Sencha,) so my comparisons may be a little weird.

Something Corporate

OSCON is a super corporate conference, given that it’s name includes Open Source. For the most part this is fine; it means that there is a huge amount of swag (more on that later,) lots of networking to be done, and many free meals. On the other hand OSCON is crazy expensive; I would argue not worth the price. I got the lowest tier, since my wife didn’t want me to be gone for the full four days (and probably six including travel,) and it cost me a whopping twelve hundred dollars. Of course ZipRecruiter reimbursed me, but for those who are used to this, YAPC costs $200 max, normally.

On top of that there were what are called “sponsored talks.” I was unfamiliar with this concept but the basic idea is that a company can pay a lot of money and be guaranteed a slot, which is probably a keynote, to sortav shill their wares. I wouldn’t mind this if it weren’t for the fact that these talks, as far as I could tell, were universally bad. The one that stands out the most was from IBM, with this awesome line (paraphrased:)

Oh if you don’t use you’re not really an engineer. Maybe go back and read some more Knuth.


At YAPC you tend to get 1-3 shirts, some round tuits, and maybe some stickers. At OSCON I avoided shirts and ended up with six; I got a pair of socks, a metal bottle, a billion pretty awesome stickers, a coloring book, three stress toys, and a THOUSAND DOLLAR SKATEBOARD. To clarify, not everyone got the skateboard; the deal was that you had to get a Heroku account (get socks!) run a node app on your laptop (get shirt!) and then push it up to Heroku (get entered into drawing!) Most people gave up at step two because they had tablets or something, but I did it between talks because that all was super easy on my laptop. I actually was third in line after the drawing, but first and second never showed. Awesome!

The Hallway Track

For me the best part of any conference is what is lovingly called “the hallway track.” The idea is that the hallway, where socializing and networking happen, is equally important to all the other tracks (like DevOps, containers, or whatever.) I really enjoy YAPC’s hallway track, though a non-trivial reason is that I already have many friends in the Perl and (surprisingly distinct) YAPC world. On top of that YAPC tends to be in places that are very walkable, so it’s easy to go to a nice restaurant or bar with new friends.

I was pleasantly surprised by the OSCON hallway track. It was not as good as YAPC’s, but it was still pretty awesome. Here are my anecdotes:

Day 1 (Wed)

At lunch I hung out with Paul Fenwick and a few other people, which was pretty good. Chatting with Paul is always great and of course we ended up talking about ExoBrain and my silly little pseudoclone: Lizard Brain.

At dinner I decided to take a note from Fitz Elliot’s book, who once approached me after I did a talk and hung out with me a lot during the conference. I had a lot of good conversations with Fitz and I figured that maybe I could be half as cool as him and do the same thing. The last talk I went to was about machine learning and the speaker, Andy Kitchen, swerved into philosophy a few times, so I figured we’d have a good time and get along if I didn’t freak him out too much by asking if we could hang out. I was right, we (him, his partner Laura Summers, a couple other guys, and I) ended up going to a restaurant and just having a generally good time. It was pretty great.

Day 2 (Thu)

At lunch on Thursday I decided to sit at the Perl table and see who showed up. Randal Schwartz, who I often work with, was there, which was fun. A few other people were there. Todd Rinaldo springs to mind. I’ve spoken to him before, but this time we found an important common ground in trying to reduce memory footprints. I hope to collaborate with him to an extent and publish our results.

Dinner was pretty awesome. I considered doing the same thing I did on Wednesday, but I thought it’d be hugely weird to ask the girl who did the last talk I saw if she wanted to get dinner. That means something else, usually. So I went to the main area where people were sorta congregating and went to greet some of the Perl people that I recognized (Liz, Wendy, David H. Alder.) They were going to Max’s Wine Bar and ended up inviting me, and another girl who I sadly cannot remember the name of. Larry Wall (who invented Perl,) and his wife Gloria and one of his sons joined us, which was pretty fun. At the end of dinner (after I shared an amazing pair of deserts with Gloria) Larry and Wendy fought over who would pay the bill, and Larry won. This is always pretty humbling and fun. The punchline was that the girl who came with us didn’t know who Larry was, because she was mostly acquainted with Ruby. When Wendy told her there were many pictures taken. It was great.

Day 3 (Fri)

Most of Friday I tried to chill and recuperate. I basically slept, packed, went downtown to get lunch and coffee, and then waited for a cab to the airport. Then when I got to the airport I was noticed by another OSCON attendee (Julie Gunderson) because I was carrying the giant branded skateboard. She was hanging out with AJ Bowen and Jérôme Petazzoni, and they were cool with me tagging along with them to get a meal before we boarded the plane. It’s pretty cool that we were able to have a brief last hurrah after the conference was completely over.


One thing that I was pretty disappointed in was the general reaction when I mentioned that I use Perl. I have plenty of friends in Texas who think poorly of Perl, but I had assumed that was because they mostly worked on closed source software. The fact that a conference that was originally called The Perl Conference would end up encouraging such an anti-Perl attitude is very disheartening.

Don’t get me wrong, Perl is not perfect, but linguistic rivalries only alienate people. I would much rather you tell me some exciting thing you did with Ruby than say “ugh, why would someone build a startup on Perl?” I have a post in the queue about this, so I won’t say a lot more about this. If you happen to read this and are a hater, maybe don’t be a hater.

Overall the conference was a success for me. If I had to choose between a large conference like OSCON and a small conference like YAPC, I’d choose the latter. At some point I’d like to try out the crazy middle ground of something like DefCon where it’s grass roots but not corporate. Maybe in a few years!

Posted Fri, May 20, 2016

Faster DBI Profiling

Nearly two months ago I blogged about how to do profiling with DBI, which of course was about the same time we did this at work.

At the same time there was a non-trivial slowdown in some pages on the application. I spent some time trying to figure out why, but never made any real progress. On Monday of this week Aaron Hopkins pointed out that we had set $DBI::Profile::ON_DESTROY_DUMP to an empty code reference. If you take a peak at the code you’ll see that setting this to a coderef is much less efficient than it could be.

So the short solution is to set $DBI::Profile::ON_DESTROY_DUMP to false.

A better solution is to avoid the global entirely by making a simple subclass of DBI::Profile. Here’s how I did it:

package MyApp::DBI::Profile;

use strict;
use warnings;

use parent 'DBI::Profile';

sub DESTROY {}


This correctly causes the destructor to do nothing, and allows us to avoid setting globals. If you are profiling all of your queries like we are, you really should do this.

Posted Wed, May 18, 2016

Setting up Let's Encrypt and Piwik

Late last week I decided that I wanted to set up Piwik on my blog. I’ll go into how to do that later in the post, but first I ran into a frustraing snag: I needed another TLS certificate. Normally I use StartSSL, because I’ve used them in the past, and I actually started to attempt to go down the path of getting another certificate through them this time, but I ran into technical difficulties that aren’t interesting enough to go into.

Let’s Encrypt

I decided to finally bite the bullet and switch to Let’s Encrypt. I’d looked into setting it up before but the default client was sorta heavyweight, needing a lot of dependencies installed and maybe more importantly it didn’t support Apache. On Twitter at some point I read about acmetool, a much more predictable tool with automated updating of certificates built in. Here’s how I set it up:

Install acmetool

I’m on Debian, but since it’s a static binary, as the acmetool documentation states, the Ubuntu repository also works:

sudo sh -c \
  "echo 'deb xenial main' > \
sudo apt-key adv \
  --keyserver \
  --recv-keys 9862409EF124EC763B84972FF5AC9651EDB58DFA
sudo apt-get update
sudo apt-get install acmetool

Configure acmetool

First I ran sudo acmetool quickstart. My answers were:

  • 1, to use the Live Let’s Encrypt servers
  • 2, to use the PROXY challenge requests

And I think it asked to install a cronjob, which I said yes to.

Get some certs

This is assuming you have your DNS configured so that your hostname resolves to your IP address. Once that’s the case you should simply be able to run this command to get some certs:

sudo acmetool want \ \ \

Configure Apache with the certs

There were a couple little things I had to do to get multiple certificates (SNI) working on my server. First off, /etc/apache2/ports.conf needs to look like this:

NameVirtualHost *:443
Listen 443

Note that my server is TLS only; if you support unencrypted connections obviously the above will be different.

Next, edit each site that you are enabling. So for example, my /etc/apache2/sites-availabe/piwik looks like this:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost

        SSLEngine on
        SSLCertificateFile      /var/lib/acme/live/
        SSLCertificateKeyFile   /var/lib/acme/live/
        SSLCertificateChainFile /var/lib/acme/live/

        ProxyPass "/.well-known/acme-challenge" ""
        DocumentRoot /var/www/piwik
        <Location />
                Order allow,deny
                allow from all

        ErrorLog ${APACHE_LOG_DIR}/error.log
        LogLevel warn

        CustomLog ${APACHE_LOG_DIR}/access.log combined

I really like that the certificate files end up in a place that is predictable and clear.

After doing the above configuration, you should be able to restart apache (sudo /etc/init.d/apache2 restart), access your website, and see it using a freshly minted Let’s Encrypt certificate.

Configure auto-renewal

Let’s Encrypt certificates do not last very long at all. Normally a cheap or free certificate will last a year, a more expensive one will last two years, and some special expensive EV certs can last longer, with I think a normal max of five? The Let’s Encrypt ones last ninety days. With an expiration so often, automation is a must. This is where acmetool really shines. If you allowed it to install a cronjob it will periodically renew certificates. That’s all well and good but your server needs to be informed that a new certificate has been installed. The simplest way to do this is to edit the /etc/default/acme-reload file and set SERVICES to apache2.


The initiator of all of the above was to set up Piwik. If you haven’t heard of Piwik, it’s basically a locally hosted Google Analytics. The main benefit being that people who use various ad-blockers and privacy tools will not be blocking you, and reasonably so as your analytics will not leave your server.

The install was fairly straight forward. The main thing I did was follow the instructions here and then when it came to the MySQL step I ran the following commands as the mysql root user (mysql -u root -p):

CREATE USER 'piwik'@'localhost' IDENTIFIED BY 'somepassword';
use piwik;
GRANT ALL PRIVILEGES ON *.* TO 'piwik'@'localhost';

So now that I have Piwik I can see interesting information much more easily than before, where I wrote my own little tools to parse access logs. Pretty neat!

Posted Sat, May 14, 2016

Rage Inducing Bugs

I have run into a lot of bugs lately. Maybe it’s actually a normal amount, but these bugs, especially taken together, have caused me quite a bit of rage. Writing is an outlet for me and at the very least you can all enjoy the show, so here goes!

X11 Text Thing

I tweeted about this one a few days ago. The gist is that, sometimes, when going back and forth from suspend, font data in video memory gets corrupted. I have a theory that it has to do with switching from X11 to DRI (the old school TTYs), but it is not the most reproducable thing in the world, so this is where I’ve had to leave it.


I reported a bug against Firefox recently about overlay windows not getting shown. There is a workaround for this, and the Firefox team (or at least a Firefox individual, Karl Tomlinson) has been willing to look at my errors and have a dialog with me. I have a sinking feeling that this could be a kernel driver bug or maybe a GTK3 bug, but I have no idea how to verify that.


Vim has been crashing on my computer for a while now. I turned on coredumps for gvim only so that I could repro it about three weeks ago, and I finally got the coveted core yesterday. I dutifully inspected the core dump with gdb and got this:

Program terminated with signal SIGBUS, Bus error.
#0  0x00007fa650515757 in ?? ()

Worthless. I knew I needed debugging symbols, but it turns out there is no vim-dbg. A while ago (like, eight years ago) Debian (and thus Ubuntu) started storing debugging symbols in a completely separate repository. Thankfully a Debian developer, Niels Thykier, was kind enough to point this out to me that I was able to install the debugging symbols. If you want to do that yourself you can follow instructions here, but I have to warn you, you will get errors, because I don’t think Ubuntu has really put much effort into this working well.

After installing the debugging symbols I got this much more useful backtrace:

#0  0x00007fa650515757 in kill () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000555fad98c273 in may_core_dump () at os_unix.c:3297
#2  0x0000555fad98dd20 in may_core_dump () at os_unix.c:3266
#3  mch_exit (r=1) at os_unix.c:3263
#4  <signal handler called>
#5  in_id_list (cur_si=<optimized out>, cur_si@entry=0x555fb0591700, list=0x6578655f3931313e, 
    ssp=ssp@entry=0x555faf7497a0, contained=0) at syntax.c:6193
#6  0x0000555fad9fb902 in syn_current_attr (syncing=syncing@entry=0, displaying=displaying@entry=0, 
    can_spell=can_spell@entry=0x0, keep_state=keep_state@entry=0) at syntax.c:2090
#7  0x0000555fad9fc1b4 in syn_finish_line (syncing=syncing@entry=0) at syntax.c:1781
#8  0x0000555fad9fcd3f in syn_finish_line (syncing=0) at syntax.c:758
#9  syntax_start (wp=0x555faf633720, lnum=3250) at syntax.c:536
#10 0x0000555fad9fcf45 in syn_get_foldlevel (wp=0x555faf633720, lnum=lnum@entry=3250) at syntax.c:6546
#11 0x0000555fad9167e9 in foldlevelSyntax (flp=0x7ffe2b90beb0) at fold.c:3222
#12 0x0000555fad917fe8 in foldUpdateIEMSRecurse (gap=gap@entry=0x555faf633828, level=level@entry=1, 
    startlnum=startlnum@entry=1, flp=flp@entry=0x7ffe2b90beb0, 
    getlevel=getlevel@entry=0x555fad9167a0 <foldlevelSyntax>, bot=bot@entry=7532, topflags=2)
    at fold.c:2652
#13 0x0000555fad918dbf in foldUpdateIEMS (bot=7532, top=1, wp=0x555faf633720) at fold.c:2292
#14 foldUpdate (wp=wp@entry=0x555faf633720, top=top@entry=1, bot=bot@entry=2147483647) at fold.c:835
#15 0x0000555fad919123 in checkupdate (wp=wp@entry=0x555faf633720) at fold.c:1187
#16 0x0000555fad91936a in checkupdate (wp=0x555faf633720) at fold.c:217
#17 hasFoldingWin (win=0x555faf633720, lnum=5591, firstp=0x555faf633798, lastp=lastp@entry=0x0, 
    cache=cache@entry=1, infop=infop@entry=0x0) at fold.c:158
#18 0x0000555fad91942e in hasFolding (lnum=<optimized out>, firstp=<optimized out>, 
    lastp=lastp@entry=0x0) at fold.c:133
#19 0x0000555fad959c3e in update_topline () at move.c:291
#20 0x0000555fad9118ee in buf_reload (buf=buf@entry=0x555faf25e210, orig_mode=orig_mode@entry=33204)
    at fileio.c:7155
#21 0x0000555fad911d0c in buf_check_timestamp (buf=buf@entry=0x555faf25e210, focus=focus@entry=1)
    at fileio.c:6997
#22 0x0000555fad912422 in check_timestamps (focus=1) at fileio.c:6664
#23 0x0000555fada1091b in ui_focus_change (in_focus=<optimized out>) at ui.c:3203
#24 0x0000555fad91fd96 in vgetc () at getchar.c:1670
#25 0x0000555fad920019 in safe_vgetc () at getchar.c:1801
#26 0x0000555fad96e775 in normal_cmd (oap=0x7ffe2b90c440, toplevel=1) at normal.c:627
#27 0x0000555fada5d665 in main_loop (cmdwin=0, noexmode=0) at main.c:1359
#28 0x0000555fad88d21d in main (argc=<optimized out>, argv=<optimized out>) at main.c:1051

I am already part of the vim mailing list, so I sent an email and see responses (though sadly not CC’d to me) as I write this post, so hopefully this will be resolved soon.

Linux Kernel Bugs

I found a bug in the Linux Kernel, probably related to the nvidia drivers, but I’m not totally sure. I’d love for this to get resolved, though reporting kernel bugs to Ubuntu has not gone well for me in the past.

Vim sessions

The kernel bug above causes the computer to crash during xrandr events; this means that I end up with vim writing a fresh new session file during the event (thanks to the excellent Obsession by Tim Pope) and the session file getting hopelessly corrupted, because it fails midwrite.

I foolishly mentioned this on the #vim channel on freenode and was reminded how often IRC channels are actually unrelated to aptitude. The people in the channel seemed to think that if the kernel crashes, there is nothing that can be done by a program to avoid losing data. I will argue that while it is hard, it is not impossible. The most basic thing that can and should be done is:

  1. Write to a tempfile
  2. Rename the tempfile to the final file

This should be atomic and safe. There are many ways that dealing with files can go wrong, but to believe it is impossible to protect against them is unimpressive, to say the least.

I will likely submit the above as a proper bug to the Vim team tomorrow. In the meantime this must also be done in Obsession, and I have submitted a small patch to do what I outlined above. I’m battle testing it now and will know soon if it resolves the problem.

I feel better. At the very least I’ve submitted bugs, and in one of the most annoying cases, been able to submit a patch. When you run into a bug, why not do the maintainer a solid and report it? And if you can, fix it!

Posted Tue, May 10, 2016

Putting MySQL in Timeout

At work we are working hard to scale our service to serve more users and have fewer outages. Exciting times!

One of the main problems we’ve had since I arrived is that MySQL 5.6 doesn’t really support query timeouts. It has stall timeouts, but if a query takes too long there’s not a great way to cancel it. I worked on resolving this a few months ago and was disapointed that I couldn’t seem to come up with a good solution that was simple enough to not scare me.

A couple weeks ago we hired a new architect (Aaron Hopkins) and he, along with some ideas from my boss, Bill Hamlin, came up with a pretty elegant and simple way to tackle this.

The solution is in two parts, the client side, and a reaper. On the client you simply set a stall timeout; this example is Perl but any MySQL driver should expose these connection options:

my $dbh = DBI->connect('dbd:mysql:...', 'zr', $password, {
   mysql_read_timeout  => 2 * 60,
   mysql_write_timeout => 2 * 60,

This will at the very least cause the client to stop waiting if the database disappears. If the client is doing a query and pulling rows down over the course of 10 minutes, but is getting a new row every 30s, this will not help.

To resolve the above problem, we have a simple reaper script:


use strict;
use warnings;

use DBI;
use JSON;
use Linux::Proc::Net::TCP;
use Sys::Hostname;

my $actual_host = hostname();

my $max_timeout = 2 * 24 * 60 * 60;
$max_timeout = 2 * 60 * 60 if $actual_host eq 'db-master';

my $dbh = DBI->connect(
    RaiseError => 1
    mysql_read_timeout => 30,
    mysql_write_timeout => 30,

my $sql = <<'SQL';
SELECT,, pl.time,
  FROM information_schema.processlist pl
 WHERE pl.command NOT IN ('Sleep', 'Binlog Dump') AND
       pl.user NOT IN ('root', 'system user') AND
       pl.time >= 2 * 60

while (1) {
  my $sth = $dbh->prepare_cached($sql);

  my $connections;

  while (my $row = $sth->fetchrow_hashref) {
    kill_query($row, 'max-timeout') if $row->{time} >= $max_timeout;

    if (my ($json) = ($row->{info} =~ m/ZR_META:\s+(.*)$/)) {
      my $data = decode_json($json);

      kill_query($row, 'web-timeout') if $data->{catalyst_app};

    $connections ||= live_connections();
    kill_query($row, 'zombie') if $connections->{$row->{host}}

  sleep 1;

sub kill_query {
  my ($row, $reason) = @_;
  no warnings 'exiting';

  warn sprintf "killing «%s», reason %s\n", $row->{info}, $reason;
  $dbh->do("KILL CONNECTION ?", undef, $row->{id}) unless $opt->noaction;

sub live_connections {
  my $table = Linux::Proc::Net::TCP->read;

  return +{
    map { $_->rem_address . ':' . $_->local_port => 1 } $table->listeners

There are a lot of subtle details in the above script; so I’ll do a little bit of exposition. First off, the reaper runs directly on the database server. We define the absolute maximum timeout based on the hostname of the machine, with 2 days being the timeout for reporting and read-only minions, and 2 hours being the timeout for the master.

The SQL query grabs all running tasks, but ignores a certain set of tasks. Importantly, we have to whitelist a couple users because one (root) is where extremely long running DDL takes place and the other (system user) is doing replication, basically constantly.

We iterate over the returned queries, immediately killing those that took longer than the maximum timeout. Any queries that our ORM (DBIx::Class) generated have a little bit of logging appended as a comment with JSON in it. We can use that to tweak the timeout further; initially by choking down web requests to a shorter timeout, and later we’ll likely allow users to set a custom timeout directly in that comment.

Finally, we kill queries whose client has given up the ghost. I did a test a while ago where I started a query and then killed the script doing the query, and I could see that MySQL kept running the query. I can only assume that this is because it could have been some kind of long running UPDATE or something. I expect the timeouts will be the main cause of query reaping, but this is a nice stopgap that could pare down some pointless crashed queries.

I am very pleased with this solution. I even think that if we eventually switch to Aurora all except the zombie checking will continue to work.

Posted Sun, May 8, 2016

A new Join Prune in DBIx::Class

At work a coworker and I recently went on a rampage cleaning up our git branches. Part of that means I need to clean up my own small pile of unmerged work. One of those branches is an unmerged change to our subclass of the DBIx::Class Storage Layer to add a new kind of join prune.

If you didn’t know, good databases can avoid doing joins at all by looking at the query and seeing where (or if) the joined in table was used at all. DBIx::Class does the same thing, for databases that do not have such tooling built in. In fact there was a time when it could prune certain kinds of joins that even the lauded PostgreSQL could not. That may no longer be the case though.

The rest of what follows in this blog post is a very slightly tidied up commit message of the original branch. Enjoy!

Recently Craig Glendenning found a query in the ZR codebase that was using significant resources; the main problem was that it included a relationship but didn’t need to. We fixed the query, but I was confused because DBIx::Class has a built in join pruner and I expected it to have transparently solved this issue.

It turns out we found a new case where the join pruner can apply!

If you have a query that matches all of the following conditions:

  • a relationship is joined with a LEFT JOIN
  • that relationship is not in the WHERE
  • that relationship is not in the SELECT
  • the query is limited to one row

You can remove the matching relationship. The WHERE and SELECT conditions should be obvious: if a relationship is used in the WHERE clause, you need it to be joined for the WHERE clause to be able to match against the column. Similarly, for the SELECT clause the relationship must be included so that the column can actually be referenced in the SELECT clause.

The one row and LEFT JOIN conditions are more subtle; but basically consider this case:

You have a query with a limit of 2 and you join in a relationship that has zero or more related rows. If you get back zero rows for all of the relationships, the root table will basically be returned and you’ll just get the first two rows from that table. But consider if you got back two related rows for each row in the root table: you would only get back the first row from the root table.

Similarly, the reason that LEFT is specified is that if it were a standard INNER JOIN, the relationship will filter the root table based on relationship.

If you specify a single row, when a relationship is LEFT it is not filtering the root table, and the “exploding” nature of relationships does not apply, so you will always get the same row.

I’ve pushed the change that adds the new join prune to GitHub, and notified the current maintainer of DBIx::Class in the hopes that it can get merged in for everyone to enjoy.

Posted Fri, Apr 29, 2016

Python: Taking the Good with the Bad

For the past few months I’ve been working on a side project using Python. I’ll post about that project some other time, but now that I’ve used Python a little bit I think I can more reasonably consider it (so not just “meaningful whitespace?!?“)

It’s much too easy to write a bunch of stuff that is merely justification of the status quo (in my case that is the use of Perl.) I’m making an effort to consider all of the good things about Python and only mentioning Perl when there is a lack. I’d rather not compare them at all, but I don’t see a way around that without silly mental trickery.

Note that this is about Python 2. If you want to discuss Python 3, let’s compare it to Perl 6.

Generally awesome stuff about Python

The following are my main reasons for liking Python. They are in order of importance, and some have caveats.


Generators (also known as continuations) are an awesome linguistic feature. It took me a long time to understand why they are useful, but I think I can summarize it easily now:

What if you wanted to have a function with an infinite loop in the middle?

In Perl, the typical answer might be to build an interator. This is fine, but it can be a lot of work. In Python, you just use normal code, and a special keyword, yield. For simple stuff, the closures you have available to you in Perl will likely seem less magic. But for complicated things, like iterating over the nodes in a tree, Python will almost surely be easier.

Let me be clear: in my mind, generators are an incredibly important feature and that Perl lacks them is significant and terrible. There are efforts to get them into core, and there is a library that implements them, but it is not supported on the newest versions of Perl.


Structured data is one of the most important parts of programming. Arrays are super important; I think that’s obvious. Hashes are, in my opinion, equally useful. There are a lot of other types of collections that could be considered after the point of diminishing returns once hashes are well within reach, but a few are included in Python and I think that’s a good thing. To clarify, in Python, one could write:

cats = set(['Dantes', 'Sunny Day', 'Wheelbarrow'])
tools = set(['Hammer', 'Screwdriver', 'Wheelbarrow'])

print cats.intersection(tools)

In Perl that can be done with a hash, but it’s a hassle, so I tend to use Set::Scalar.

Python also ships with an OrderedDict, which is like Perl’s Tie::IxHash. But Tie::IxHash is sorta aging and weird and what’s with that name?

A Python programmer might also mention that the DefaultDict is cool. I’d argue that the DefaultDict merely works around Python’s insistence that the programmer be explicit about a great many things. That is: it is a workaround for Pythonic dogma.

Rarely need a compiler for packages

In my experience, only very rarely do libraries need to be compiled in Python. So oviously math intensive stuff like crypto or high precision stuff will need a compiler, but the vast majority of other things do not. I think part of the reason for this is that Python ships with an FFI library (ctypes). So awesome.

In Perl, even the popular OO framework Moose requires a compiler!


If you want to define your own weird kind of dictionary in Python, it’s really easy: you subclass dict and define around ten methods. It will all just work. This applies to all of Python’s builtins, I believe.

In Perl, you have to use tie, which is similar but you can end up with oddities related to Perl’s weird indirect method syntax. Basically, often things like print $fhobject $str will not work as expected. Sad camel.

Interactive Python Shell

Python ships with an excellent interactive shell, which can be used by simply running python. It has line editing, history, builtin help, and lots of other handy tools for testing out little bits of code. I have lots of little tools to work around the lack of a good interactive shell in Perl. This is super handy.

Simple Syntax

The syntax of Python can be learned by a seasoned programmer in an afternoon. Awesome.

Cool, weird projects

I’ll happily accept more examples for this. A few spring to mind:

  1. BCC is sorta like a DTrace but for Linux.
  2. PyNES lets you run NES games written in Python.
  3. BITS is a Python based operating system, for doing weird hardware stuff without having to write C.

Batteries Included

Python ships with a lot of libraries, like the builtins above, that are not quite so generic. Some examples that I’ve used include a netrc parser, an IMAP client, some email parsing tools, and some stuff for building and working with iterators. The awesome thing is that I’ve written some fairly handy tools that in Perl would have certainly required me to reach for CPAN modules.

What’s not so awesome is that the libraries are clearly not of the high quality one would desire. Here are two examples:

First, the core netrc library can only select by host, instead of host and account. This was causing a bug for me when using OfflineIMAP. I rolled up my sleeves, cloned cpython, fixed the bug, and then found that it had been reported, with a patch, five years ago. Not cool.

Second, the builtin email libraries are pretty weak. To get the content of a header I had to use the following code:

import email.header
import re

decoded_header = str(email.header.make_header(email.header.decode_header(header)))
unfolded_header = re.sub('[\r\n]', '', decoded_header)

I’m not impressed.

There are more examples, but this should be sufficient.

Now before you jump on me as a Perl programmer: Perl definintely has some weak spots in it’s included libraries, but unlike with Python, the vast majority of those are actually on CPAN and can be updated without updating Perl. Unless I am missing something, that is not the case with the Python core libraries.


The Python community as a whole, or at least my interaction with it, seems to be fairly intent on defining the one-and-true way to do anything. This is great for new programmers, but I find it condescending and unhelpful. I like to say that the following are all the programmer’s creed (stolen from various media):

That which compiles is true.

Nothing is True and Everything is Permissible

“Considered Harmful” Considered Harmful

Generally not awesome stuff about Python

As before, these are things that bother me about Python, in order.

Variable Scope and Declaration

Python seems to aim to be a boring but useful programming language. Like Java, but a scripting language. This is a laudable goal and I think Go is the newest in this tradition. Why would a language that intends to be boring have any scoping rules that are not strictly and exclusively lexical? If you know, tell me.

In Perl, the following code would not even compile:

use strict;

sub print_x { print("$x\n") }
my $x = 1;

In Python, it does what a crazy person would expect:

def foo():

x = 1

The real problem here is that in Python, variables are never declared. It is not an error to set x = 1 in Python, how else would you create the variable? In Perl, you can define a variable as lexical with my, global with our, and dynamic with local. Python is a sad mixture of lexical and global. The fact that anyone would ever need to explain scoping implies that it’s pretty weird.

PyPI and (the lack of) friends

I would argue that since the early 2000’s, a critical part of a language is its ecosystem. A language that has no libraries is lonely, dreary work. Python has plenty of libraries, but the actual web presence of the ecosystem is crazily fractured. Here are some things that both and MetaCPAN do that PyPI does not:

  • Include and render all of the documentation for all modules (example)
  • Include a web accessible version of all (or almost all) releases of the code (example, example)

And MetaCPAN does a ton more; here are features I often use:

And there’s a constellation of other tools; here are my favorites:

  • CPANTesters aggregates the test results of individuals and smoke machines of huge amounts of CPAN on a ton of operating systems. Does your module run on Solaris?
  • is a powerful issue tracker that creates a queue of issues for every module released on CPAN. Nowadays with Github that’s not as important as it used to be, but even with Github, RT still allows you to create issues without needing to login.


This is related to my first complaint about PyPI above. When I install software on my computer, I want to read the docs that are local to the installed version. There are two reasons for this:

  1. I don’t want to accidentally read docs for a different version than what is installed.
  2. I want to be able to read documentation when the internet is out.

Because the documentation of Python packages is so free form, people end up hosting their docs on random websites. That’s fine, I guess, but people end up not including the documentation in the installed module. For example, if you install boltons, you’ll note that while you can run pydoc boltons, there is no way to see this page via pydoc. Pretty frustrating.

On top of that, the documentation by convention is reStructuredText. rst is fine, as a format. It’s like markdown or POD (Perl’s documentation format) or whatever. But there are (at least) two very frustrating issues with it:

  1. There is no general linking format. In Perl, if I do L<DBIx::Class::Helpers>, it will link to the doc for that module. Because of the free form documentation in Python, this is impossible.
  2. It doesn’t render at all with pydoc; you just end up seeing all the noisy syntax.

And it gets worse! There is documentation for core Python that is stored on a wiki! A good example is the page about the time complexity of various builtins. There is no good reason for this documentation to not be bundled with the actual Python release.

matt’s script archive

As much as the prescriptivism of Python exists to encourage the community to write things in a similar style; a ton of old code still exists that is just as crappy as all the old Perl code out there.

I love examples, and I have a good one for this. My little Python project involves parsing RSS (and Atom) feeds. I asked around and was pointed at feedparser. It’s got a lot of shortcomings. The one that comes to mind is, if you want to parse feeds without sanitizing the included html, you have to mutate a global. Worse, this is only documented in a comment in the source code.


Python has this frustrating behaviour when it comes to printing Unicode. Basically if the programmer is printing Unicode (the string is not bytes, but meaningful characters) to a console, Python assumes that it can encode as UTF8. If it’s printing to anything else it defaults to ASCII and will often throw an exception. This means you might have some code that works perfectly well when you are testing it Interactively, and when it happens to print just ASCII when redirected to a file, but when characters outside of ASCII show up it throws an exception. (Try it and see: python -c 'print(u"\u0420")' | cat) (Read more here.)

It’s also somewhat frustrating that the Python wiki complains that Python predates Unicode and thus cannot be expected to support it, while Perl predates even Python, but has excellent support for Unicode built into Perl 5 (the equivalent of Python 2.x.) A solid example that I can think of is that while Python encourages users to be aware of Unicode, it does not give users a way to compare strings ignoring case. Here’s an example of where that matters; if we are ignoring case, “ß” should be equal to “ss”. In Perl you can verify this by running: perl -Mutf8 -E'say "equal" if fc "ß" eq fc "ss"'. In Python one must download a package from PyPI which is documented as an order of magnitude slower than the core version from Python 3.


In Unix there is this signal, SIGPIPE, that gets sent to a process when the pipe it is writing to gets closed. This can be a simple efficiency improvement, but even ignoring efficiency, it will get used. Imagine you have code that reads from a database, then prints a line, then reads, etc. If you wanted the first 10 rows, you could pipe to head -n10 and both truncate after the 10th line and kill the program. In Python, this causes an exception to be thrown, so users of Python programs who know and love Unix will either be annoyed that they see a stack trace, or submit annoying patches to globally ignore SIGPIPE in your program.

Overall, I think Python is a pretty great language to have available. I still write Perl most of the time, but knowing Python has definitely been helpful. Another time I’ll write a post about being a polyglot in general.

Posted Thu, Apr 21, 2016

Humane Interfaces

In this post I just want to briefly discuss and demonstrate a humane user interface that I invented at work.

At ZipRecruiter, where I work, we use a third party system called Each employee is given $20 in the form of 100 Zip Points at the beginning of each month. These points can be given to any other employee for any reason, and then redeemed for gift cards basically anywhere (Amazon, Starbucks, Steam, REI, and even as cash with Paypal, just to name a few.)

Of course the vast majority of users give bonusly by using the web interface, where you pick a user with an autocompleter, you select the amount with a dropdown, and you type the reason and hashtag (you must include a hashtag) in a textfield. This is fine for most users, but I hate the browser because it’s so sluggish and bloated. The other option is to use the built in Slack interface. I used that for a long time; it works like this: /give +3 to @sara for Helping me with my UI #teamwork

This is pretty good but there is one major problem: the username above is based on the local part of an email address, even though when it comes to Slack using @foo looks a lot like a Slack username. I kept accidentally giving bonusly to the wrong Sara!

Bonusly has a pretty great API and one of my coworkers released an inteface on CPAN. I used this API to write a small CLI script. The actual script is not that important (but if you are interested let me know and I’ll happily publish it.) What’s cool is the interface. First off here is the argument parsing:

my ($amount, $user, $reason);

for (@ARGV) {
  if (m/^\d+$/) {
    $amount = $_;
  } elsif (!m/#/) {
    $user = $_;
  } else {
    $reason = $_;

die "no user provided!\n"   unless $user;
die "no amount provided!\n" unless $amount;
die "no reason provided!\n" unless $reason;

The above parses an amount, a user, and a reason for the bonus. The amount must be a positive integer, and the reason must include a hashtag. Because of this, we can ignore the ordering. This solves an unstated annoyance with the Slack integration of Bonusly; I do not have to remember the ordering of the arguments, I just type what makes sense!

Next up, the user validation, which resolves the main problem:

# The following just makes an array of users like:
# Frew Schmidt <>

my @users =
  grep _render_user($_) =~ m/$user/i,

if (@users > 1) {
  warn "Too many users found!\n";
  warn ' * ' . _render_user($_) . "\n" for @users;
  exit 1;
} elsif (@users < 1) {
  warn "No users match! Could $user be a contractor?\n";
  exit 2;

The above will keep from accidentally selecting one of many users by prompting the person running the script for a more specific match.

Of course the above UI is not perfect for every user. But I am still very pleased to have unordered positional arguments. I hope this inspires you to reduce requirements on your users when they are using your software.

Posted Sat, Apr 9, 2016

CloudFront Migration Update

When I migrated my blog to CloudFront I mentioned that I’d post about how it is going in late March. Well it’s late March now so here goes!

First off, I switched from using the awscli tools and am using s3cmd because it does the smart thing and only syncs if the md5 checksum is different. Not only does this make a sync significantly faster, it also reduces PUTs which are a major part of the cost of this endeavour.

Speaking of costs, how much is this costing me? February, which was a partial month, cost a total of $0.03. One might expect March to cost more than four times that amount (still couch change) but because of the s3cmd change I made, the total cost in March so far is $0.04, with a forecast of $0.05. There is one cost that I failed to factor in: logging.

While my full blog is a svelte 36M, just the logs for CloudFront over the past 36 days has been almost double that; and they are compressed with gzip! The logging incurs additional PUTs to S3 as well as an additional storage burden. The free tier includes 5G of free storage, but pulling down the log files as structured (a file per region per hour gzipped) is a big hassle. I had over five thousand log files to download, and it took about an hour. I’m not sure how I’ll deal with it in the future but I may periodically pull down those logs, consolidate them, and replace them with a rolled up month at a time file.

Because the logs were slightly easier to interact with than before I figured I’d pull them down and take a look. I had to write a little Perl script to parse and merge the logs. Here’s that, for the interested:

#!/usr/bin/env perl

use 5.20.0;
use warnings;

use autodie;

use Text::CSV;

my $glob = shift;
my @values = @ARGV;
my @filelisting = glob($glob);

for my $filename (@filelisting) {
  open my $fh, '<:gzip', $filename;
  my $csv = Text::CSV->new({ sep_char => "\t" });
      date time x_edge_location sc_bytes c_ip method host cs_uri_stem sc_status
      referer user_agent uri_query cookie x_edge_result_type x_edge_request_id
      x_host_header cs_protocol cs_bytes time_taken x_forwarded_for ssl_protocol
      ssl_cipher x_edge_response_result_type
  # skip headers
  $csv->getline($fh) for 1..2;
  while (my $row = $csv->getline_hr($fh)) {
    say join "\t", map $row->{$_}, @values

To get all of the accessed URLs, with counts, I ran the following oneliner:

perl '*.2016-03-*.gz' cs_uri_stem | sort | uniq -c | sort -n

There are some really odd requests here, along with some sorta frustrating issues. Here are the top thirty, with counts:

  27050 /feed
  24353 /wp-content/uploads/2007/08/transform.png
  13723 /feed/
   8044 /static/img/me200.gif
   5011 /index.xml
   4607 /favicon.ico
   3866 /
   2491 /static/css/styles.css
   2476 /static/css/bootstrap.min.css
   2473 /static/css/fonts.css
   2389 /static/js/bootstrap.min.js
   2384 /static/js/jquery.js
   2373 /robots.txt
    966 /posts/install-and-configure-the-ms-odbc-driver-on-debian/
    637 /wp-content//uploads//2007//08//transform.png
    476 /archives/1352
    311 /wp-content/uploads/2007/08/readingminds2.png
    278 /keybase.txt
    266 /posts/replacing-your-cyanogenmod-kernel-for-fun-and-profit/
    225 /archives/1352/
    197 /feed/atom/
    191 /static/img/pong.p8.png
    166 /posts/concurrency-and-async-in-perl/
    155 /n/a
    149 /posts/weirdest-interview-so-far/
    144 /apple-touch-icon.png
    140 /apple-touch-icon-precomposed.png
    133 /posts/dbi-logging-and-profiling/
    126 /posts/a-gentle-tls-intro-for-perlers/
    120 /feed/atom

What follows is pretty intense navel gazing that I suspect very few people care about. I think it’s interesting but that’s because like most people I am somewhat of a narcissist. Feel free to skip it.

So /feed, /feed/, /feed/atom, and /feed/atom/ are in this list a lot, and sadly when I migrated to CloudFront I failed to set up the redirect header. I’ll be figuring that out soon if possible.

/, /favicon.ico, and /index.xml are all normal and expected. It really surprises me how many things are accessing / directly. A bunch of it is people, but a lot is feed readers. Why they would hit / is beyond me.

/wp-content/uploads/2007/08/transform.png and /wp-content//uploads//2007//08//transform.png (from this page) seems to be legitimately popular. It is bizarrely being accessed from a huge variety of User Agents. At the advice of a friend I looked more closely and it turns out it’s being hotlinked by a Vietnamese social media site or something. This is cheap enough that I don’t care enough to do anything about it.

/wp-content/uploads/2007/08/readingminds2.png is similar to the above.

/static/img/me200.gif is an avatar that I use on a few sites. Not super surprising, but as always: astounded at the number.

/robots.txt Is being accessed a lot, presumably by all the various feed readers. It might be worthwhile to actually create that file. No clue.

/static/css/* and /static/js/* should be pretty obvious. I would consider using those from a CDN but my blog is already on a CDN so what’s the point! But it might be worth at least adding some headers so those are cached by browsers more aggressively.

/posts/install-and-configure-the-ms-odbc-driver-on-debian/ (link) is apparently my most popular post, and I would argue that that is legitimate. I should automate some kind of verification that it continues to work. I try to keep it updated but it’s hard now that I’ve stopped using SQL Server myself.

/archives/1352 and /archives/1352/ is pre-hugo URL URL for the announcement of DBIx::Class::DeploymentHandler. I’m not sure why the old URL is being linked to, but I am glad I put all that effort into ensuring that old links keep working.

/keybase.txt is the identity proof for Keybase (which I have never used by the way.) It must check every four hours or something.

/posts/replacing-your-cyanogenmod-kernel-for-fun-and-profit/ (link) is a weird post of mine, but I’m glad that a lot of people are interested, because it was a lot of work to do.

/static/img/pong.p8.png, /posts/weirdest-interview-so-far/ (link), and /posts/dbi-logging-and-profiling/ (link) were all on / at some point in the month so surely people just clicked those from there.

/posts/concurrency-and-async-in-perl/ (link) and /posts/a-gentle-tls-intro-for-perlers/ (link) are more typical posts of mine, but are apparently pretty popular and I would say for good reason.

/n/a, /apple-touch-icon.png, /apple-touch-icon-precomposed.png all seem like some weird user agent thing, like maybe iOS checks for that if someone makes a bookmark?

World Wide Readership

Ignoring the seriously hotlinked image above, I can easily see where most of my blog is accessed:

perl '*.2016-03-*.gz' cs_uri_stem x_edge_location  | \
  grep -v 'transform' | cut -f 2 | perl -p -e 's/[0-9]+//' | \
  sort | uniq -c | sort -n

Here’s the top 15 locations which serve my blog:

  21330 JFK # New York
   9668 IAD # Washington D.C.
   8845 ORD # Chicago
   7098 LHR # London
   6536 FRA # Frankfurt
   5319 DFW # Dallas
   4568 ATL # Atlanta
   4328 SEA # Seattle
   3345 SFO # San Fransisco
   3137 CDG # Paris
   2991 AMS # Amsterdam
   2966 EWR # Newark
   2339 LAX # Los Angeles
   1993 ARN # Stockholm
   1789 WAW # Warsaw

I’m super pleased at this, because before the migration to CloudFront all of this would be served from a single server in DFW. It was almost surely enough but it’d be slower, especially for the stuff outside of the states.

Aside from the fact that I have not yet set up the redirect for the old feed URLs, I think the migration to CloudFront has gone very well. I’m pleased that I’m less worried about rebooting my Linode and that my blog is served quickly, cheaply, and efficiently to readers worldwide.

Posted Sat, Mar 26, 2016

DBI Logging and Profiling

If you use Perl and connect to traditional relational databases, you use DBI. Most of the Perl shops I know of nowadays use DBIx::Class to interact with a database. This blog post is how I “downported” some of my DBIx::Class ideas to DBI. Before I say much more I have to thank my boss Bill Hamlin, for showing me how to do this.

Ok so when debugging queries, with DBIx::Class you can set the DBIC_TRACE environment variable and see the queries that the storage layer is running. Sadly sometimes the queries end up mangled, but that is the price you pay for pretty printing.

You can actually get almost the same thing with DBI directly by setting DBI_TRACE to SQL. That is technically not supported everywhere, but it has worked everywhere I’ve tried it. If I recall correctly though, unlike with DBIC_TRACE, using DBI_TRACE=SQL will not include any bind arguments.

Those two features are great for ad hoc debugging, but at some point in the lifetime of an application you want to count the queries executed during some workflow. The obvious example is during the lifetime of a request. One could use DBIx::Class::QueryLog or something like it, but that will miss queries that were executed directly through DBI, and it’s also a relatively expensive way to just count queries.

The way to count queries efficiently involves using DBI::Profile, which is very old school, like a lot of DBI. Here’s how I got it to work just recording counts:

#!/usr/bin/env perl

use 5.12.0;
use warnings;

use Devel::Dwarn;
use DBI;
use DBI::Profile;
$DBI::Profile::ON_DESTROY_DUMP = undef;

my $dbi_profile = DBI::Profile->new(
  Path => [sub { $_[1] eq 'execute' ? ('query') : (\undef) }]

$DBI::shared_profile = $dbi_profile;

my $dbh = DBI->connect('dbi:SQLite::memory:');
my $sth = $dbh->prepare('SELECT 1');

$sth = $dbh->prepare('SELECT 2');

my @data = $dbi_profile->as_node_path_list;
Dwarn \@data;

And in the above case the output is:


The outermost arrayref is supposed to contain all of the profiled queries, so each arrayref inside of that is a query, with it’s profile data as the first value (another arrayref) inside, and all of the values after that first arrayref are user configurable.

So the above means that we ran six queries. There are some numbers about durations but they are so small that I won’t consider them carefully here. See the link above for more information. Normally if you had used DBI::Profile you would see two distinct queries, with a set of profiling data for each, but here we see them all merged into a single bucket. All of the magic for that is in my Path code references.

Let’s dissect it carefully:

$_[1] eq 'execute' # 1
  ? ('query')      # 2
  : (\undef)       # 3

Line 1 checks the DBI method being used. This is how we avoid hugely inflated numbers. We are trading off some granularity here for a more comprehensible number. See, if you prepare 1000 queries, you are still doing 1000 roundtrips to the database, typically. But that’s a weird thing, and telling a developer how many “queries they did” is easier to understand when that means simply executing the query.

In line 2 we return ('query'). This is what causes all queries to be treated as if they were the same. We could have returned any constant string here. If we wanted to do something weird, like count based on type of query, we could do something clever like the following:

return (\undef) if $_[1] eq 'execute';
local $_ = $_;

return ($_);

That would create a bucket for SELECT, UPDATE, etc.

Ok back to dissection; line 3 returns (\undef), which is weird, but it’s how you signal that you do not want to include a given sample.

So the above is how you generate all of the profiling information. You can be more clever and include caller data or even bind parameters, though I’ll leave those as a post for another time. Additionally, you could carefully record your data and then do some kind of formatting at read time. Unlike DBIC_TRACE where you can end up with invalid SQL, you could use this with post-processing to show a formatted query if and only if it round trips.

Now go forth; record some performance information and ensure your app is fast!

UPDATE: I modified the ON_DESTROY_DUMP to set it to undef instead of an empty code reference. This correctly avoids a lot of work at object destriction time. Read this for more information.

Posted Thu, Mar 24, 2016

How to Enable ptrace in Docker 1.10

This is just a quick blog post about something I got working this morning. Docker currently adds some security to running containers by wrapping the containers in both AppArmor (or presumably SELinux on RedHat systems) and seccomp eBPF based syscall filters. This is awesome and turning either or both off is not recommended. Security is a good thing and learning to live with it will make you have a better time.

Normally ptrace, is disabled by the default seccomp profile. ptrace is used by the incredibly handy strace. If I can’t strace, I get the feeling that the walls are closing in, so I needed it back.

One option is to disable seccomp filtering entirely, but that’s less secure than just enabling ptrace. Here’s how I enabled ptrace but left the rest as is:

A handy perl script


use strict;
use warnings;

# for more info check out

# This script simply helps to mutate the default docker seccomp profile.  Run it
# like this:
#     curl | \
#           build-seccomp > myapp.json

use JSON;

my $in = decode_json(do { local $/; <STDIN> });
push @{$in->{syscalls}}, +{
  name => 'ptrace',
  action => 'SCMP_ACT_ALLOW',
  args => []
} unless grep $_->{name} eq 'ptrace', @{$in->{syscalls}};

print encode_json($in);

In action

So without the custom profile you can see ptrace not working here:

$ docker run alpine sh -c 'apk add -U strace && strace ls'
(1/1) Installing strace (4.9-r1)
Executing busybox-1.23.2-r0.trigger
OK: 6 MiB in 16 packages
strace: test_ptrace_setoptions_for_all: PTRACE_TRACEME doesn't work: Operation not permitted
strace: test_ptrace_setoptions_for_all: unexpected exit status 1

And then here is using the profile we generated above:

$ docker run --security-opt "seccomp:./myapp.json" alpine sh -c 'apk add -U strace && strace ls'
2016/03/18 17:08:53 Error resolving syscall name copy_file_range: could not resolve name to syscall - ignoring syscall.
2016/03/18 17:08:53 Error resolving syscall name mlock2: could not resolve name to syscall - ignoring syscall.
(1/1) Installing strace (4.9-r1)
Executing busybox-1.23.2-r0.trigger
OK: 6 MiB in 16 packages
execve(0x7ffe02456c88, [0x7ffe02457f30], [/* 0 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x7f0df919c048) = 0
set_tid_address(0x7f0df919c080)         = 16
mprotect(0x7f0df919a000, 4096, PROT_READ) = 0
mprotect(0x5564bb1e7000, 16384, PROT_READ) = 0
getuid()                                = 0
ioctl(0, TIOCGWINSZ, 0x7ffea2895340)    = -1 ENOTTY (Not a tty)
ioctl(1, TIOCGWINSZ, 0x7ffea2895370)    = -1 ENOTTY (Not a tty)
ioctl(1, TIOCGWINSZ, 0x7ffea2895370)    = -1 ENOTTY (Not a tty)
stat(0x5564bafdde27, {...})             = 0
open(0x5564bafdde27, O_RDONLY|O_DIRECTORY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents64(3, 0x5564bb1ec040, 2048)     = 512
lstat(0x5564bb1ec860, {...})            = 0
lstat(0x5564bb1ec900, {...})            = 0
lstat(0x5564bb1ec9a0, {...})            = 0
lstat(0x5564bb1eca40, {...})            = 0
lstat(0x5564bb1ecae0, {...})            = 0
lstat(0x5564bb1ecb80, {...})            = 0
lstat(0x5564bb1ecc20, {...})            = 0
lstat(0x5564bb1eccc0, {...})            = 0
lstat(0x5564bb1ecd60, {...})            = 0
lstat(0x5564bb1ece00, {...})            = 0
lstat(0x5564bb1ecea0, {...})            = 0
lstat(0x5564bb1ecf40, {...})            = 0
lstat(0x5564bb1ecfe0, {...})            = 0
lstat(0x7f0df919e6e0, {...})            = 0
lstat(0x7f0df919e780, {...})            = 0
lstat(0x7f0df919e820, {...})            = 0
getdents64(3, 0x5564bb1ec040, 2048)     = 0
close(3)                                = 0
ioctl(1, TIOCGWINSZ, 0x7ffea2895278)    = -1 ENOTTY (Not a tty)
writev(1, [?] 0x7ffea2895210, 2)        = 4
writev(1, [?] 0x7ffea2895330, 2)        = 70
exit_group(0)                           = ?
+++ exited with 0 +++

A final warning

The above is not too frustrating and is more secure than disabling seccomp entirely, but enabling ptrace as a general course of action is likely to be wrong. I am doing this because it helps with debugging stuff inside of my container, but realize that for long running processes you can always strace processes that are running in the container from the host.

Posted Fri, Mar 18, 2016

When I Planned on Moving to Australia

Many of you do not know that I was born on the Gulf Coast of Mississippi. I lived there, with a brief intermission in Oconomowoc, Wisconsin, until I moved to Texas to go to college.

That first year of school is rife with good memories; but there was a dark spot. Specifcally, Hurricane Katrina.

Katrina was a big deal. To this day there are houses that are just gone, with nothing but a slab and a lot of weeds in their place.

You people who do not live in a place where hurricanes are common (two or more per year) likely think of this like you might think of someone who left their house unlocked or something. The problem is, as I’ve already given away, hurricanes are not rare on the Gulf Coast. They happen multiple times every year and the vast majority are non-events.

You judge people for not evacuating, but that’s because you can do that safely from your living room. Nearly every summer in Mississippi I experienced at least one power outage lasting days due to a hurricane. The worst flooding ever did was damage the carpet in my awesome detached bedroom.

Katrina started like all huricanes do: on the news. Doom was predicted. Most hurricanes would hit the barrier islands and suddenly slow down, and the twenty-four hour news would stop focussing on southern Mississippi and go back to the murders and rapes the viewers so eagerly lap up.

I vividly remember watching the news and seeing Katrina heading directly for The Coast, confidently assuming that it would veer east into the sore thumb of Florida.

Then it didn’t.

After Katrina landed, cell phones stopped working. My cell, even in TX, was not reachable for weeks, even by other people in TX. I couldn’t reach my family via cell or landline. I was positive that my entire nuclear family was wiped out. In a daze I expected to run away to Australia, selling what little I owned, with the hopes of buying a motorcycle after I arrived.

Hurricane Sandy has been much more widely discussed, presumably because she was more recent, and more weird (ending up where she did.) Take note of the statistics that wikipedia helpfully includes at the top right of the article: more than a thousand people died because of Katrina. You can’t even imagine that many human beings. That’s a lot.

After a week or two I finally was able to determine that my family survived. Many people look down on Facebook, Twitter, and other social media because it didn’t exist when they were kids or some stupid thing like that. If it had not been for LiveJournal I would not have found out that my family was still alive.

Here’s a quote from August 31, 2005 from my blog at the time:

So [ … ] a few other discouraging things happened. I won’t go into details; it would take too long. Just know that it was a bummer and didn’t feel great. And then this hurricane comes blowing through everywhere and I have [wtf sic?] if my family is even alive. I am nearly positive that my house is gone, but that’s really no big deal. Anyways. What a bummer eh? Don’t pity me. Pray for my family.

Note: If you have seen The Life Aquatic, I feel like Steve Zissou did when his ship got pirated and everything turned red and he shot everyone.

I can’t find the post, because I think it was on someone elses page, but I found out that my family was ok when my friend Kate Mendoza (née Jarvis) commented on some post that she knew my family had survived.

If there are lessons to learn from this story, I think they are:

  1. You could lose it all pretty quickly; enjoy your loved ones while you can.
  2. Stop looking at a finger when someone is pointing at the moon; social media is people.
Posted Sat, Mar 12, 2016

Weirdest Interview So Far

This is a pretty good story and I want you all to hear it. When I was graduating from college I interviewed with three companies. Two of them (MTSI and Rockwell Collins) offered me jobs. The other one, Empire Systems Inc., did not.

So Empire Systems Inc. was founded by a couple LeTourneau (my alma mater) alumni. I didn’t know the two of them myself, but we did overlap, by a year or two. Anyway so back to the interviews.

They came to school for the career fair and gave their spiel, which changed a couple times, but what I recall was something like the (non-existant at the time) iPad, called the Marquess. They needed EE’s and programmers and whatnot. It sounded kinda cool and I really wanted to work there, so I submitted my (probably worthless) resume. I got an interview; I suspect everyone who submitted a resume did, but who knows.

Interview I

So they invite me into some room; I think it was a class, but it could have been some kind of break room. The important thing to know is that it had a projector. We introduced ourselves, sat down, and get started.

The first question was likely something boring, like, “How would you handle conflict on the job?” I don’t remember. What I do remember is that the next question was: “There was a dot moving around on the projector while you answered; which quadrant was it primarily in?”

The rest of the interview went like that, where they’d ask a normal question, and then some oddball thing. The other oddball questions I remember were:

  • “We’re going to show faces of famous people; identify them.”
  • “Identify the following animals as predator or prey.”

That last question, with the animals, was especially funny, because I watched the first two (tiger and shark?) and said: “predator” and then didn’t say anymore, assuming that they all were. I was wrong but didn’t correct myself.

Interview II

I remember waiting a long time to hear back. I followed up a couple times before getting the coveted on site interview. But finally they raised the capital they needed to hire more engineers. They had me, another peer, and some others out. I remember vividly that for some reason, presumably lack of planning, they brought us via train instead of plane, even though it was not any cheaper. (As an aside, that train ride was when AJAX finally clicked for me, and I think I leveled up as an engineer, I guess to level 1?)

I remember two things about that situation. The first was that the interview, again, was crazy. It first featured a test. No; not a quiz where you have to implement sorting algorithms or whatever. This quiz asked details about the history of France and various parts of the Middle East. It was huge too, like, maybe twenty or thirty pages. Then after that they did a group interview. And by that I mean: one interviewer to many interviewees. So they’d ask a question and we’d all have to vie for their attention or whatever.

I recall them asking something like, “if you could invent anything, what would it be?” I think that’s a good question and it has made me think a lot ever since. Aside from that the whole thing was a joke.

The other thing I recall, with fondness, was their showing off of how it was a lifestyle job. They talked about doing karate before work and wearing ties every day and all kinds of stupid stuff like that, but it didn’t matter because it was all vapor (keep reading.) The good memory was that they had us over for a game night with some of the people who worked at Empire and their spouses and friends. That was a lot of fun. It felt really homey.

Backstory and Embezzlement

One of my friends at MTSI actually did a project with the founder of Empire in college. The friend in question told me that said founder tended to be very slippery when it came to his part of the project and had some bogus excuses relating to his other classes. Specifically he “charmed the profs with just mountains of documentation and presentation skills.”

Well, turns out he was slippery in real life too. About a year after I joined at MTSI someone sent a link about how the founder effectively stole half a million bucks. The whole thing sorta reminds me of this.

I hope you enjoyed this little walk down memory lane; I know I did!

Posted Sun, Mar 6, 2016

Migrating My Blog from Linode to CloudFront


I have just completed the process of migrating my blog to CloudFront. There are a few reasons for this. Initially I had planned to migrate everything on my Linode to OVH, which has DDoS mitigation and I think even uptime SLAs. The reasoning behind that was the Linode kept getting DDoS’ed and I was sick of it.

Additionally, in January I went to SCALE14x and Eric Hammond (who was introduced to me by Andrew Grangaard) pointed out that by using the current generation of AWS tooling (Lambda, DynamoDB, etc) you can reduce total cost to less than the minimum pricing on a Linode. The cost of my Linode isn’t super expensive (less than the price of Netflix) but every little bit helps. On top of that we use the AWS stuff at work so another chance to be familiar with AWS is a good thing.

Finally, after the most recent security fiasco I just feel safer using infrastructure that is more well tested in general. Plus I think I can get away with moving most of my stuff off of VMs, which means I’m less likely to screw something up.

As a side note, I have been self hosting my blog since 2007. I am loathe to do external hosting, as external hosts all seem to end up dying at some point anyway. I did briefly consider hosting on github, but you either have to change your domain name ( or have no TLS (more on that later) so I decided to go the manual hosting route.


For small stuff like this, it can be worthwhile to make a distinct AWS account for each project. I made a special blog account to help me with accounting if the total cost of this ends up being more than I expect. Because I have my own domain I have as many email addresses as I want, so I just made a new one specifically for my blog, and then used it to make a new AWS account.

After creating the blog account I enabled Cost Explorer. I have no idea why this has to be turned on, because it’s super helpful to be able to use. Next I Activated MFA (you know, for security!) Maybe I should have done that first. I could do something with IAM I’m sure but it would be overkill for something as single task as this that only I will ever use.

I followed instructions I found here to set up the S3 and CloudFront parts. The only issue I ran into was that I forgot to set the CNAME both in DNS and in the CloudFront config. To actually sync my blog I use the following command:

aws s3 sync --delete . s3://

The --delete flag is so that files that aren’t in the remote side get removed.

At this point you should be able to test that everything is mostly working by visiting the endpoint that the bucket provides. The CloudFront part usually takes a while because it has to sync all over the world and wait for DNS too.

Because I care about my readers I only serve my blog over HTTPS. It’s not that I think you are reading my blog in secret; I don’t want malware to be injected by messed up access points. Because of that I had to get a certificate. If I were serving from US East I could have gotten free, auto-renewing certificates from Amazon. Sadly I didn’t think to do this, even though it would have been trivial since I don’t really care where the site is served from. StartSSL also gives free certificates, so that’s what I used. To upload your certificate you need to use a command like this:

aws iam upload-server-certificate \
      --server-certificate-name blog_cert \
      --certificate-body file:// \
      --private-key file:// \
      --certificate-chain file://pwd/ \
      --path /cloudfront/blog/

Getting and creating the certificate is not something I’m super interested in writing about, as it’s pretty well documented already.


Clearly the fact that I pulled the trigger on this project means that I think it was worth it, so here are some of the benefits to using CloudFront to host my blog.


After reading the nightmare glacier post last month I commited to reading and understanding the pricing models of the various AWS services before using them. With that in mind I read about the pricing of the stuff I’ll be using for my blog before embarking on this project.

The S3 Pricing is pretty understandable. I’ll pay 3¢/mo for the storage, as my blog is about 35 mB of HTML and images total. Uploading the entire blog afresh (which I sorta assume is what sync does, but I’m not sure) is about 16k files, which (rounding up) is 2¢. So if sync works inefficiently a post is likely to cost me about 5¢, including fixing typos or whatever. Assuming a lot of posts, let’s say sixteen a month, that adds up to 80¢ per month. There is no charge to transfer from S3 to CloudFront, so that adds up to a maximum of 83¢ per month for S3.

The CloudFront Pricing is even more simple. Assuming 100% of the traffic from my Linode is my blog (it is without a doubt mostly IRC, but for accounting purposes let’s assume the worst) and it is all from India (again, nope) the charge from CloudFront will be 51¢. Assuming every single request on my server is to my blog (another verifiable falsehood) that would add whopping $1.10 to the monthly bill. That adds up to $1.61 per month for CloudFront.

So, worst case scenario, my monthly bill is $2.44 a month. I suspect it will likely be much less than that. I’ll try to remember at the end of March to update this post with what the real price ends up being.


Unlike my Linode, which always resided in the wonderful city of Dallas, TX, CloudFront specifically exists to be global. So if you read my blog from the UK (I’m sure there are some!) or Japan (eh… maybe not) it should be a lot more snappy now.


Sometimes my Linode gets rebooted for Hypervisor updates; or worse I mess up my Apache config or something. The above setup is well isolated from all my other stuff so it should be very reliable.


But it’s not all unicorns, rainbows, penny-whistles, and blow. There are some problems!


The above calculations are based on past history. If I get DDoS’d directly I will suddenly get a bill for a thousand bucks, instead of my server just falling over. That’s something that gives me serious pause. My boss told me that you can use Lambda as rate limiting tool. I expect to look into that before too long, especially because I have other plans for Lambda anyway.

Slow to Update

Unsurprisingly, because CloudFront is a CDN, there is a TTL on the cached data, so sometimes it can take a few minutes for a modification to the blog to go live. Not a huge deal, but good to know anyway.

Overall this has been a relatively painless process and I think it is worth it. I hope this helps anyone considering migration to AWS.

Posted Sat, Feb 20, 2016


While CGI is a fairly well established, if aging, protocol, UCSPI seems fairly obscure. I suspect that UCSPI may see a resurgence as finally with systemd projects will have a reason to support running in such a mode. But here I go, burying the lede.

CGI Refresher

Just as a way of illustrating by example, I think that I should explain (hopefully only by way of reminder) how CGI works. Basically a server (usually Apache, IIS, or lately, nginx) waits for a client to connect, and when it does, it parses the request and all of the request headers. They look something like this:

POST /upload?test=false HTTP/1.1
User-Agent: blog/0.0.1
Content-Type: text/plain
Content-Length: 4


And then various parts of the above go into environment variables; for example test=false would become the value of QUERY_STRING. Then the body (in this example, frew) would be written to the standard input of the CGI script. While this seems a little fiddly compared to some of the more modern APIs and frameworks, it is nice because you don’t even need a language that supports sockets. You can even write a simple script with a shell!


The response is almost just whatever the script prints to standard out, though perversely there is a small bit of modification that happens, so the server has to parse some of the output, which seems like a huge oversight in the specification. Specifically, instead of allowing the script to print:

HTTP/1.1 200 OK
Content-Type: text/html

It instead must write:

Status: 200 OK
Content-Type: text/html

and is even allowed to write:

Content-Type: text/html
Status: 200 OK

but the server still must translate that to:

HTTP/1.1 200 OK
Content-Type: text/html

This means that if the server works correctly it may need to buffer an unbound (by the spec) amount of headers before it gets to the Status header. Ah the joys of implementing a CGI server.

What is UCSPI

UCSPI stands for Unix Client Server Program Interface. Basically the way it works is that you have a tool that opens a socket and waits for new connections. When it gets a new connection it spins up a new process, setting up pipes between standard input and standard output of that process and to the input and output of the socket.

Here’s an interesting thing that I have needed to do that I could not do without UCSPI. Because each connection in the UCSPI model ends up being a separate set of processes, the connection can restart the parent UCSPI worker and still finish it’s connection.

This means that, for example, I can have a push to github automatically update my running server, without any weird side mechanisms like a second updater service or worse, a cronjob. I just do a git fetch and git reset --hard @{u}, and the next time a client connects it will be running the new code.

Here is how I did that. At some point I expect to make the automatic updater more reliable and generic.

Another sorta nice thing, though this is very much a tradeoff, is that the process that has the listening port is very small (2M on my machine) compared to, say, an actual Plack server (which is an order of magnitude bigger.) On the other hand, if your actual cgi script has a lot of dependencies it can take a long time to start, so this may not be a good long term solution.

Note that there are problems of course. Aside from the increased cost of spinning up a new server, you also have to be careful to avoid printing to standard out. If you do you are almost ensured to print your whatever before any headers which ends up being an invalid response. On the other hand you can do a bunch of weird old school type stuff like chdiring in a script and not worry about global state changes.

Aside: Plack under CGI

Because my web apps thus far have been implemented using PSGI, an abstraction of HTTP, they can run under their own servers or under CGI directly. I only really needed to do one of two things to make my application run under CGI:

I hope you found this interesting!

Posted Wed, Feb 10, 2016


I’ve really enjoyed writing Rust, lately. I posted yesterday about what I’m doing with it. In the meantime here are some immediate reactions to writing Rust:


The documentation is pretty good. It could be better, like if every single method had an example included, but it could be a lot worse. And the fact that a lot (though not all for some reason) of the documentation has links to the related source is really handy.

Language Itself

The languages feels good. This is really hard to express, but the main thing is that type inference makes a lot of the type defintions feel less burdensome than, for example, Java and friends. It also feels stratospherically high level, with closures, object orientation, destructuring, handy methods on basic types like strings, and much more. Yet it’s actually pretty low level.


The community is awesome! I have never had as many friendly and willing people help me as a total noob before. Maybe it’s because Rust has a code of conduct or maybe it’s because Mozilla are nice people. I appreciate that there are people who actually know what is up answering questions at all hours of the night; they also generally assume competence. While assuming competence may make the total amount of questions asked greater, it makes the entire exchange much mroe pleasant. More of this please!

Error Messages

The error messages are very good. For example, check this out:

$ rustc 84:43 error: unresolved name `n`. Did you mean `v`? [E0425]             Ok(v) => { *content_length = n },
                                                     ^ 84:43 help: run `rustc --explain E0425` to see a detailed explanation
error: aborting due to previous error

They all give some context like that, and then have an error code (the --explain thing) that lets you get a more complete description of what you did and how you can fix it. Sometimes the errors can be pretty inscrutable for a new user though:

$ rustc 219:7 error: the trait `core::ops::FnOnce<()>` is not implemented for the type `()` [E0277]     let mut c_stdin = f.stdin.unwrap_or_else({         warn!("Failed to get child's STDIN");         early_exit("500 Internal Server Error");     }); 219:7 help: run `rustc --explain E0277` to see a detailed explanation
error: aborting due to previous error


Searching for examples of stuff online is surprisingly hard. I don’t know if that’s because Rust is a popular video game or if it’s just because the language is fairly new. I hope to help remedy this in general.


There is certainly more, like the included package management system or other interesting language features. I may post more about those later, but the above is stuff that I ran into during my week long foray into Rust. Hope this helps!

Posted Tue, Feb 9, 2016

Announcing cgid

This post is an announcement of cgid.

Over the past week I developed a small UCSPI based single-file CGI server. The usage is very simple, due to the nature of the tool. Here’s a quick example of how I use it:

tcp-socket-listen 6000
tcp-socket-accept --no-delay

If you don’t know anything about UCSPI, this will look like nonsense to you. I have a post that I’ll publish later this week about UCSPI, so you can wait for that, or you can search for it and find lots of documents about it already.


As a side note, cgid was written in Rust. I have a post about Rust itself in the queue, but I think discussing the “release process” of a binary tool like cgid at release time is sensible. The procedure for releasing went something like this:

git tag v0.1.0 -m 'Release v0.1.0'

# release to
cargo package
cargo publish

cargo build --release
# fiddle with github webpage to put binaries on the release

This is a joke compared to the spoiling I’ve had from Dist::Zilla, which is what I use when releasing packages to CPAN. At some point I’d like to automate Rust releases as much as Rik has automated releasing to CPAN.

I’ll keep my eye out for more things that deserve to be written in Rust, as I enjoyed the process, but I expect that ideas which deserve to be written in Rust are few and far between, for me. It is pretty cool that basically not knowing Rust, I successfully implemented a tool that doesn’t exist anywhere in less than two weeks.

Posted Mon, Feb 8, 2016

Handy Rust Macros

I’ve been writing some Rust lately and have been surprised at the dearth of examples that show up when I search for what seems obvious. Anyway, I wrote a couple macros that I’ve found very handy. The first seems like it should almost be core:

macro_rules! warn {
    ($fmt:expr) => ((writeln!(io::stderr(), $fmt)).unwrap());
    ($fmt:expr, $($arg:tt)*) => ((writeln!(io::stderr(), $fmt, $($arg)*)).unwrap());

// Examples:
warn!("This goes to standard error");
warn!("Connected to host: {}", hostname);

This allows you to trivially write to standard error, and it panics if it fails to write to standard error. If it weren’t for this final detail I’d actually submit it as a pull request for Rust itself. For my code, being able to print to the standard filehandles is critical, so crashing if STDERR is closed makes sense, but there are many situations where that is not reasonable.

The next example is the more interesting one, a macro that uses an environment variable at compile time to modify what it does:

macro_rules! debug {
    ($fmt:expr) => (
        match option_env!("HTTPD_DEBUG") {
            None => (),
            Some(_) => warn!($fmt),
    ($fmt:expr, $($arg:tt)*) => (
        match option_env!("HTTPD_DEBUG") {
            None => (),
            Some(_) => warn!($fmt, $($arg)*),

// Examples:
debug!("This goes to standard error");
debug!("Connected to host: {}", hostname);

debug! works just like warn!, but if the HTTPD_DEBUG environment variable is unset at compile time it is as if nothing was even written. Sorta handy, but what’s more important is the general pattern.

I hope to be blogging more about Rust in the future. I hope this helps!

Posted Sat, Feb 6, 2016

Checking sudoers with visudo in SaltStack

At work we are migrating our server deployment setup to use SaltStack. One of the things we do at deploy time is generate a sudoers file, but as one of our engineers found out, if you do not verify the contents of the sudoers file before deploying it you will be in a world of hurt.

Salt actually has a pretty good built in tool for this, but it’s very poorly documented. This is one of the most obvious uses for it and because Googling for it didn’t work for me I figured I’d make it work for someone else.

The feature is the check_cmd flag on file.managed. The current documentation for the feature is:

The specified command will be run with the managed file as an argument. If the command exits with a nonzero exit code, the state will fail and no changes will be made to the file.

This isn’t super clear. It takes the generated content, puts it in a tmpfile, runs the command + the tmpfile path, and then replaces the real contents with the tmpfile. So here is how I used it to verify sudoers

    - name: {{ }}
    - user: root
    - group: root
    - mode: 0440
    - source: {{ sudo.config_file.source }}
    - template: {{ sudo.config_file.template }}
    - check_cmd: /usr/sbin/visudo -c -f
    - require:
      - pkg: sudo
      - group: sudo

Hope this helps!

Posted Thu, Jan 14, 2016