Reap slow and bloated plack workers

As mentioned before at ZipRecruiter we are trying to scale our system. Here are a couple ways we are trying to ensure we maintain good performance:

  1. Add timeouts to everything
  2. Have as many workers as possible

Timeouts

Timeouts are always important. A timeout that is too high will allow an external service to starve your users. A timeout that is too low will give up too quickly. No timeout is basically a timeout that is too high, no matter what. My previous post on this topic was about adding timeouts to MySQL. For what it’s worth, MySQL does have a default timeout, but it’s a year, so it’s what most people might call: too high.

Normally people consider timeouts for external services, but it turns out they are useful for our own servers as well. Sometimes people accidentally write code that can be slow in unusual cases, so while it’s fast 99.99% of the time, that last remaining 0.01% can be outage inducing by how much it can slow down code and consume web workers.

One way to add timeouts to code is to make everything asyncronous and tie all actions to clock events, so that you query the database and if the query doesn’t come back before the clock event, you have some kind of error. This is all well and good, but it means that you suddenly need async versions of everything, and I have yet to see universal RDBMS support for async. If you need to go that route you are almost better off rewriting all of your code in Go.

The other option is to bolt on an exteral watchdog, very similar to the MySQL reaper I wrote about last time.

More Workers

Everywhere I have worked the limiting factor for more workers has been memory. There are a few basic things you can do to use as little memory as possible. First and foremost, with most of these systems you are using some kind of preforking server, so you load up as many libraries before the fork as possible. This will allow Linux (and nearly all other Unix implementations) to share a lot of the memory between the master and the workers. On our system, in production, most workers are sharing about half a gig of memory with the master. That goes a really long way when you have tens of workers.

The other things you can do is attempt to not load lots of stuff into memory at all. Due to Perl’s memory model, when lots of memory is allocated, it is never returned to the operating system, and instead reserved for later use by the process. Instead of slurping a whole huge file into memory, just incrementally process it.

Lastly, you can add a stop gap solution that fits nicely in a reaper process. In addition to killing workers that are taking too long serving a single request, you can reap workers that have allocated too much memory.

smaps

Because of the mentioned sharing above, we really want to care more about private (that is, not shared) memory more than anything else. Killing a worker because the master has gotten larger is definitely counter productive. We can leverage Linux’s /proc/[pid]/smaps for this. The good news is that if you simply parse that file for a given worker and sum up the Private_Clean and Private_Dirty fields, you’ll end up with all of the memory that only that process has allocated. The bad news is that it can take a while. Greater than ten milliseconds seems typical; that means that adding it to the request lifecycle is a non-starter. This is why baking this into your plack reaper makes sense.

Plack Reaper

The listing below is a sample of how to make a plack reaper to resolve the above issues. It uses USR1 for timeouts, to simply kill those workers. The worker is expected to have code to intercept USR1, log what request it was serving (preferably in the access log) and exit. USR2 is instead meant to allow the worker to finish serving its current request, if there is one, and then exit after. You can leverage psgix.harakiri for that.

We also use Parallel::Scoreboard, which is what Plack::Middleware::ServerStatus::Lite uses behind the scenes.

(Note that this is incredibly simplified from what we are actually using in production. We have logging, more robust handling of many various error conditions, etc.)

#!/usr/bin/perl

use strict;
use warnings;

use Linux::Smaps;
use Parallel::Scoreboard;
use JSON 'decode_json';

my $scoreboard_dir = '/tmp/' . shift;
my $max_private    = shift;

my $scoreboard = Parallel::Scoreboard->new(
  base_dir => $scoreboard_dir,
);

while (1) {
  my $stats = $scoreboard->read_all;

  for my $pid (keys %$stats) {
    my %status = %{decode_json($stats->{$pid})};

    # undefined time will be become zero, age will be huge, should get killed
    my $age = time - $status{time};

    kill USR1 => $pid
      if $age > timeout(\%status);

    my $smaps = Linux::Smaps->new($pid);

    my $private = $smaps->private_clean + $smaps->private_dirty;
    kill USR2 => $pid
      if $private > $max_private;
  }

  sleep 1;
}

sub timeout {
  return 10 * 60 if shift->{method} eq 'POST';
  2 * 60
}

I am very pleased that we have the above running in production and increasing our effective worker count. Maybe next time I’ll blog about our awesome logging setup, or how I (though not ZipRecruiter) think strictures.pm should be considered harmful.

Until next time!

Posted Wed, Jun 29, 2016

AWS Retirement Notification Bot

If you use AWS a lot you will be familiar with the “AWS Retirement Notification” emails. At ZipRecruiter, when we send our many emails, we spin up tens of servers in the middle of the night. There was a period for a week or two where I’d wake up to one or two notifications each morning. Thankfully those servers are totally ephemeral. By the time anyone even noticed the notification the server was completely gone. Before I go further, here’s an example of the beginning of that email (the rest is static:)

Dear Amazon EC2 Customer,

We have important news about your account (AWS Account ID: XXX). EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: i-deadbeef) in the us-east-1 region. Due to this degradation, your instance could already be unreachable. After 2016-07-06 02:00 UTC your instance, which has an EBS volume as the root device, will be stopped.

Note that the identifier there is totally not useful to a human being. Every time we got this notification someone on my team would log into the AWS console, look up the server, and email the team: “the server is gone, must have been one of the email senders” or maybe “the server is an email sender and will be gone soon anyway.”

Like many good programmers I am lazy, so I thought to myself: “I should write an email bot to automate what we are doing!”

Behold:

#!/usr/bin/perl

use strict;
use warnings;

use Mail::IMAPClient;
use Email::Address;
use Email::Sender::Simple qw(sendmail);
use Data::Dumper::Concise;
use Try::Tiny;

my ($from) = Email::Address->parse('Zip Email Bot <email-bot@ziprecruiter.com>');
my $imap = Mail::IMAPClient->new(
  Server   => 'imap.gmail.com',
  User     => $from->address,
  Password => $ENV{ZIP_EMAIL_BOT_PASS},
  Ssl      => 1,
  Uid      => 1,
) or die 'Cannot connect to imap.gmail.com as ' . $from->address . ": $@";

$imap->select( $ENV{ZIP_EMAIL_BOT_FOLDER} )
  or die "Select '$ENV{ZIP_EMAIL_BOT_FOLDER}' error: ", $imap->LastError, "\n";

for my $msgid ($imap->search('ALL')) {

  require Email::MIME;
  my $e = Email::MIME->new($imap->message_string($msgid));

  # if an error happens after this the email will be forgotten
  $imap->copy( 'processed', $msgid )
    or warn "Could not copy: $@\n";

  $imap->move( '[Gmail]/Trash', $msgid )
    or die "Could not move: $@\n";
  $imap->expunge;

  my @ids = extract_instance_list($e);

  next unless @ids;

  my $email = build_reply(
    $e, Dumper(instance_data(@ids))
  );

  try {
    sendmail($email)
  } catch {
    warn "sending failed: $_";
  };
}

# We ignore stuff in the inbox, stuff we care about gets filtered into another
# folder.
$imap->select( 'INBOX' )
  or die "Select 'INBOX' error: ", $imap->LastError, "\n";

my @emails = $imap->search('ALL');

if (@emails) {
  $imap->move( '[Gmail]/Trash', \@emails )
    or warn "Failed to cleanup inbox: " . $imap->LastError . "\n";
}
$imap->expunge;

$imap->logout
  or die "Logout error: ", $imap->LastError, "\n";


# A lot of this was copy pasted from Email::Reply; I'd use it except it has some
# bugs and I was recommended to avoid it.  I sent patches to resolve the bugs and
# will consider using it directly if those are merged and released.
# -- fREW 22Mar2016
sub build_reply {
  my ($email, $body) = @_;

  my $response = Email::MIME->create;

  # Email::Reply stuff
  $response->header_str_set(From => "$from");
  $response->header_str_set(To => $email->header('From'));

  my ($msg_id) = Email::Address->parse($email->header('Message-ID'));
  $response->header_str_set('In-Reply-To' => "<$msg_id>");

  my @refs = Email::Address->parse($email->header('References'));
  @refs = Email::Address->parse($email->header('In-Reply-To'))
    unless @refs;

  push @refs, $msg_id if $msg_id;
  $response->header_str_set(References => join ' ', map "<$_>", @refs)
    if @refs;

  my @addrs = (
    Email::Address->parse($email->header('To')),
    Email::Address->parse($email->header('Cc')),
  );
  @addrs = grep { $_->address ne $from->address } @addrs;
  $response->header_str_set(Cc => join ', ', @addrs) if @addrs;

  my $subject = $email->header('Subject') || '';
  $subject = "Re: $subject" unless $subject =~ /\bRe:/i;
  $response->header_str_set(Subject => $subject);

  # generation of the body
  $response->content_type_set('text/html');
  $response->body_str_set("<pre>$body</pre>");

  $response
}

sub extract_instance_list {
  my $email = shift;

  my %ids;
  $email->walk_parts(sub {
    my $part = shift;
    return if $part->subparts; # multipart
    return if $part->header('Content-Disposition') &&
      $part->header('Content-Disposition') =~ m/attachment/;

    my $body = $part->body;

    while ($body =~ m/\b(i-[0-9a-f]{8,17})\b/gc) {
      $ids{$1} = undef;
    }
  });

  return keys %ids;
}

sub find_instance {
  my $instance_id = shift;

  my $res;
  # could infer region from the email but this is good enough
  for my $region (qw( us-east-1 us-west-1 eu-west-1 )) {
    $res = try {
      # theoretically we could fetch multiple ids at a time, but if we get the
      # "does not exist" exception we do not want it to apply to one of many
      # instances.
      _ec2($region)->DescribeInstances(InstanceIds => [$instance_id])
        ->Reservations
    } catch {
      # we don't care about this error
      die $_ unless m/does not exist/m;
      undef
    };

    last if $res;
  }

  return $res;
}

sub instance_data {
  return unless @_;
  my %ids = map { $_ => 'not found (no longer exists?)' } @_;

  for my $id (keys %ids) {
    my $res = find_instance($id);

    next unless $res;

    my ($i, $uhoh) = map @{$_->Instances}, @$res;

    next unless $i;

    warn "multiple instances found for one instance id, wtf\n" if $uhoh;

    $ids{$id} = +{
      map { $_->Key => $_->Value }
      @{$i->Tags}
    };
  }

  return \%ids;
}


my %ec2;
sub _ec2 {
  my $region = shift;

  require Paws;

  $ec2{$region} ||= Paws->service('EC2', region => $region );

  $ec2{$region}
}

There’s a lot of code there, but this is the meat of it:

my @ids = extract_instance_list($e);

next unless @ids;

my $email = build_reply(
  $e, Dumper(instance_data(@ids))
);

try {
  sendmail($email)
} catch {
  warn "sending failed: $_";
};

And then the end result is a reply-all to the original email that looks something like this:

Subject: Re: [Retirement Notification] Amazon EC2 Instance scheduled for retirement.

{
  "i-8c288e74" => {
    Level => "prod",
    Name => "send-22",
    Team => "Search"
  }
}

The code above is cool, but the end result is awesome. I don’t log into the AWS console often, and the above means I get to log in even less. This is the kind of tool I love; for the 99% case, it is quiet and simplifies all of our lives. I can see the result on my phone; I don’t have to connect to a VPN or ssh into something; it just works.

colophon

The power went out in the entire city of Santa Monica today, but I was able to work on this blog post (including seeing previews of how it would render) and access the emails that it references thanks to both my email setup and my blog setup. Hurray for software that works without the internet!

Posted Wed, Jun 22, 2016

Vim: Goto File

Vim has an awesome feature that I think is not shown off enough. It’s pretty easy to use and configure, but thankfully many languages have a sensible configuration out of the box.

Vim has this feature that opens a file when you press gf over a filename. On the face of it, it’s only sort of useful. There are a couple settings that make this feature incredibly handy.

path

First and foremost, you have to set your path. Typically when you open a Perl script or module in vim, the path is set to something like this:

  • $(pwd)
  • /usr/include
  • $PERL5LIB
  • And Perl’s default @INC

It’s a good idea to add the path of your current project, for example:

:set path+=lib

So on a typical Linux system, you can type out zlib.h and press gf over it and pull up the zlib headers. The next feature is what really makes it powerful.

suffixesadd and includeexpr

The more basic of the two options is suffixesadd. It is simply a list of suffixes to attempt to add to the filename. So in the example above, if you :set suffixesadd=.h and then type zlib and then press gf on the word, you’ll pull of the header files for zlib. That’s too basic for most modern programming environments though. Here’s the default includeexpr for me when I open a perl script:

substitute(substitute(substitute(v:fname,'::','/','g'),'->*','',''),'$','.pm','')

Let’s unpack that to make sure we see what’s going on. This may be subtly incorrect syntax, but that’s fine. The point is to communicate what is happening above.

to_open = v:fname

# replace all :: with /
to_open = substitute(to_open,'::','/','g')

# remove any method call (like ->foo)
to_open = substitute(to_open,'->*','','')

# append a .pm
to_open = substitute(to_open,'$','.pm','')

With the above we can find the filename to open. This is the default. You can do even better, if you put in a little effort. Here is an idea I’d like to try when I get some time, call a function as the expression, and in the function, if the fname contains, ->resultset(...) return the namespaced resultset. I’d need to tweak the ifsname to allow selecting weird characters, and maybe that would be more problematic than it’s worth, but it’s hard to know before you try. Could be really handy!

Even if you don’t go further with this idea, consider using gf more often. I personally use it (plus CTRL-O as a “back” command”) to browse repos and even the Perl modules they depend on.

Posted Tue, Jun 21, 2016

Staring into the Void

Monday of this week either Gmail or OfflineIMAP had a super rare transient bug and duplicated all of the emails in my inbox, twice. I had three copies of every email! It was annoying, but I figured it would be pretty easy to fix with a simple Perl script. I was right; here’s how I did it:

#!/usr/bin/env perl

use 5.24.0;
use warnings;

use Email::MIME;
use IO::All;

my $dir = shift;

my @files = io->dir($dir)->all_files;

my %message_id;

for my $file (@files) {
   my $message_id = Email::MIME->new( $file->all )->header_str('message-id');
   unless ($message_id) {
      warn "No Message-ID for $file\n";
      next;
   }

   $message_id{$message_id} ||= [];
   push $message_id{$message_id}->@*, $file->name;
}

for my $message_id (keys %message_id) {
   my ($keep, @remove) = $message_id{$message_id}->@*;

   say "# keep $keep";
   say "rm $_" for @remove;
}

After running the script above I could eyeball the output and be fairly confident that I was not accidentally deleting everything. Then I just re-ran it and piped the output to sh. Et voilà! The inbox was back to normal, and I felt good about myself.

Then I got nervous

Sometimes when you are programming, you solve real world problems, like what day you’ll get married. Other times, you’re just digging yourself out of the pit that is everything that comes with programming. This is one of those times. I’ve mentioned my email setup before, and I am still very pleased with it. But I have to admit to myself that this problem would never have happened if I were using the web interface that Gmail exposes.

See, while I can program all day, it’s not actually what I get paid to do. I get paid to solve problems, not make more of them and then fix them with code. It’s a lot of fun to write code; when you write code you are making something and you get the nearly instant gratification of seeing it work.

I think code can solve many problems, and is worth doing for sure. In fact I do think the code above is useful and was worth writing and running. But it comes really close to what I like to call “life support” code. Life support code is not code that keeps a person living. Life support code is code that hacks around bugs or lack of features or whatever else, to keep other code running.

No software is perfect; there will always be life support code, incidental complexity, lack of idempotence, and bugs. But that doesn’t mean that I can stop struggling against this fundamental truth and just write / support bad software. I will continue to attempt to improve my code and the code around me, but I think writing stuff like the above is, to some extent, a warning sign.

Don’t just mortgage your technical debt; pay it down. Fix the problems. And keep the real goal in sight; you do not exist to pour your blood into a machine: solve real problems.

Posted Thu, Jun 16, 2016

Vim Session Workflow

Nearly a year ago I started using a new vim workflow leveraging sessions. I’m very pleased with it and would love to share it with anyone who is interested.

Session Creation

This is what really made sessions work for me. Normally in vim when you store a session, which almost the entire state of the editor (all open windows, buffers, etc) you have to do it by hand, with the :mksession command. While that works, it means that you are doing that all the time. Tim Pope released a plugin called Obsession which resolves this issue.

When I use Obsession I simply run this command if I start a new project: :Obsess ~/.vvar/sessions/my-cool-thing. That will tell Obsession to automatically keep the session updated. I can then close vim, and if I need to pick up where I left off, I just load the session.

Lately, because I’m dealing with stupid kernel bugs, I have been using :mksession directly as I cannot seem to efficiently make session updating reliable.

Session Loading

I store my sessions (and really all files that vim generates to function) in a known location. The reasoning here is that I can then enumerate and select a session with a tool. I have a script that uses dmenu to display a list, but you could use one of those hip console based selectors too. Here’s my script:

#!/bin/zsh

exec gvim -S "$(find ~/.vvar/sessions -maxdepth 1 -type f | dmenu)"

That simply starts gvim with the selected session. If the session was created with Obsession, it will continue to automatically update.


This allows me to easily stop working on a given project and pick up exactly where I left off. It would be perfect if my computer would stop crashing; hopefully it’s perfect for you!

Posted Thu, Jun 9, 2016

DBI Caller Info

At ZipRecruiter we have a system for appending metadata to queries generated by DBIx::Class. About a month ago I posted about bolting timeouts onto MySQL and in the referenced code I mentioned parsing said metadata. We are depending on that metadata more and more to set accurate timeouts on certain page types.

Adding Metadata to DBI Queries

Because of our increased dependence on query metadata, I decided today that I’d look into setting the metadata at the DBI layer instead of the DBIx::Class layer. This not only makes debugging certain queries easier, but more importantly allows us to give extra grace to queries coming from certain contexts.

First we define the boilerplate packages:

package ZR::DBI;

use 5.14.0;
use warnings;

use base 'DBI';

use ZR::DBI::db;
use ZR::DBI::st;

1;
package ZR::DBI::st;

use 5.14.0;
use warnings;

use base 'DBI::st';

1;

Next we intercept the prepare method. In this example we only grab the innermost call frame. At work we not only walk backwards based on a regex on the filename; we also have a hash that adds extra data, like what controller and action are being accessed when in a web context.

package ZR::DBI::db;

use 5.14.0;
use warnings;

use base 'DBI::db';

use JSON::XS ();

sub prepare {
  my $self = shift;
  my $stmt = shift;

  my ($class, $file, $line, $sub) = caller();

  $stmt .= " -- ZR_META: " . encode_json({
    class => $class,
    file  => $file,
    line  => $line,
    sub   => $sub,
  }) . "\n";

  $self->SUPER::prepare($stmt, @_);
}

1;

Finally use the subclass:

my $dbh = DBI->connect($dsn, $user, $password, {
    RaiseError         => 1,
    AutoCommit         => 1,

    RootClass          => 'ZR::DBI',
});

The drawback of the above is that it could (and maybe is?) destroying the caching of prepared statements. In our system that doesn’t seem to be very problematic, but I suspect it depends on RDBMS and workload. Profile your system before blindly following these instructions.

Wow that’s all there is to it! I expected this to be a lot of work, but it turns out Tim Bunce had my back and made this pretty easy. It’s pretty great when something as central as database access has been standardized!

Posted Wed, Jun 8, 2016

My Custom Keyboard

A few years ago I made my own keyboard, specifically an ErgoDox. I’ve been very pleased with it in general and I have finally decided to write about it.

ErgoDox

The ErgoDox is sortav an open-source cross between the Kinesis Advantage and the Kinesis Freestyle. It’s two effectively independent halves that have a similar layout to the Advantage, especially the fact that the keys are in a matrix layout. If you don’t know what that means, think about the layout of a numpad and how the keys are directly above each other as opposed to staggered like the rest of the keyboard. That’s a matrix layout.

The other major feature of the ErgoDox is the thumb clusters. Instead of delegating various common keys like Enter and Backspace to pinky fingers, many keys are pressed by a thumb. Of course the idea is that the thumb is stronger and more flexible and thus more able to deal with consistent usage. I am not a doctor and can’t really evaluate the validity of these claims, but it’s been working for me.

The ErgoDox originally only shipped as a kit, so I ended up soldering all of the diodes, switches, etc together on a long hot day in my home office with a Weller soldering iron I borrowed from work. Of course because I had not done a lot of soldering or even electrical stuff I first soldered half of the diodes on backwards and had to reverse them. That was fun!

Firmware

My favorite thing about my keyboard is that it runs my own custom firmware. It has a number of interesting features, but the coolest one is that when the operator holds down either a or ; the following keys get remapped:

  • h becomes
  • j becomes
  • k becomes
  • l becomes
  • w becomes Ctrl + →
  • b becomes Ctrl + ←
  • y becomes Ctrl + C
  • p becomes Ctrl + V
  • d becomes Ctrl + X
  • y becomes Ctrl + Z
  • x becomes Delete

For those who can’t tell, this is basically a very minimal implementation of vi in the hardware of the keyboard. I can use this in virtually any context. The fact that keys that are not modifiers at all are able to be used in such a manner is due to the ingenuity of TMK.

Keycaps

When I bought the ErgoDox kit from MassDrop I had the option of either buying blank keycaps in a separate but concurrent drop, or somehow scrounging up my own keycaps somewhere else. After a tiny bit of research I decided to get the blank keycaps.

Zodiak

I had the idea for this part of my keyboard after having the keyboard for just a week. I’d been reading Homestuck which inspired me to use the Zodiak for the function keys (F1 through F12.)

After having the idea I emailed Signature Plastics, who make a lot of keycaps, about pricing of some really svelte keys. Note that this is three years ago so I expect their prices are different. (And really the whole keycap business has exploded so who knows.) Here was their response:

In our DCS family, the Cherry MX compatible mount is the 4U. Will all 12 of the Row 5 keycaps have the same text or different text on them? Pricing below is based on each different keycap text. As you will see our pricing is volume sensitive, so if you had a few friends that wanted the same keys as you, you would be better off going that route.

  • 1 pc $98.46 each
  • 5 pcs $20.06 each
  • 10 pcs $10.26 each
  • 15 pcs $6.99 each
  • 25 pcs $4.38 each
  • 50 pcs $2.43 each

Please note that our prices do not include shipping costs or new legend fees should the text you want not be common text. Let me know if you need anything else!

So to be absolutely clear, if I were to get a set all by myself the price would exceed a thousand dollars, for twelve keys. I decided to start the process of setting up a group buy. I’m sad to say that I can’t find the forum where I initiated that. I thought it was GeekHack but there’s no post from me before I had the Zodiak keys.

Anyway just a couple of days after I posted on the forum I got this email from Signature Plastics:

I have some good news! It appears your set has interested a couple people in our company and we have an offer we were wondering if you would consider. Signature Plastics would like to mold these keycaps and place them on our marketplace. In turn for coming up with the idea (and hopefully helping with color selection and legend size) we will offer you a set free of charge… What do you think?

Of course I was totally down. I in fact ordered an extra set myself since I ended up making two of these keyboards eventually! Here’s a screenshot of the keycaps from their store page:

Keycaps

For those who don’t know, these keys are double-shot, which means each key is actually two pieces of plastic: an orange piece (the legend,) and a black piece which contains the legend. This means that no matter how much I type on them, the legend won’t wear off even after twenty years of usage. Awesome.

Stealth

A couple of months after building the keyboard I came to the conclusion that I needed legends on all of the keys. I can touch type just fine, but when doing weird things like pressing hotkeys outside of the context of programming or writing I need the assistance of a legend. So I decided to make my own stealth keycaps.

You can see the original post on GeekHack here.

Here are the pictures from that thread:

Left

Right

Also, if you didn’t already, I recommend reading that short thread. The folks on GeekHack are super friendly, positive, and supportive. If only the rest of the internet could be half as awesome.

Miscellany

The one other little thing I’ve done to the keyboard is to add small rubber O-rings underneath each key. I have cherry blues (which are supposed to click like an IBM Model-M) but with the O-rings they keyboard is both fairly quiet and feels more gentle on my hands. A full depress of a key, though unrequired with a mechanical switch, is cushioned by the rings.


My keyboard is one of the many tools that I use on a day to day basis to get my job done. It allows me to feel more efficient and take pride in the tools that I’ve built to save myself time and hopefully pain down the road. I have long had an unfinished post in my queue about how all craftspersons should build their own tools, and I think this is a fantastic example of that fine tradition.

Go. Build.

Posted Sat, Jun 4, 2016

Serverless

A big trend lately has been the rise of “serverless” software. I’m not sure I’m the best person to define that term, but my use of the term generally revolves around avoiding a virtual machine (or a real machine I guess.) I have a server on Linode that I’ve been slowly removing services from in an effort to get more “serverless.”

It’s not about chasing fads. I am a professional software engineer and I mostly use Perl; I sorta resist change for the sake of it.

It’s mostly about the isolation of the components. As it stands today my server is a weird install of Debian where the kernel is 64 bit and the userspace is 32 bit. This was fine before, but now it means I can’t run Docker. I had hoped to migrate various parts of my own server to containers to be able to more easily move them to OVH when I eventually leave Linode, but I can’t now.

Services

I could just rebuild the server, but then all of these various services that run on my server would be down for an unknown amount of time. To make this a little more concrete, here are the major services that ran on my blog at the beginning of 2016:

  1. Blog (statically served content from Apache)
  2. Lizard Brain (Weird automation thing)
  3. IRC Client (Weechat)
  4. RSS (An install of Tiny Tiny RSS; PHP on Apache)
  5. Feeds (various proxied RSS feeds that I filter myself)
  6. Git repos (This blog and other non-public repositories)
  7. SyncThing (Open source decentralized DropBox like thing)

The above are ordered in terms of importance. If SyncThing doesn’t work for some reason, I might not even notice. If my blog is down I will be very angsty.

Blog

I’ve already posted about when I moved my blog off Linode. That’s been a great success for me. I am pleased that this blog is much more stable than it was before; it’s incredibly secure, despite the fact that it’s “on someone else’s computer;” and it’s fast and cheap!

Feeds

After winning a sweet skateboard from Heroku I decided to try out their software. It’s pretty great! The general idea is that you write some kind of web based app, and it will get run in a container on demand by Heroku, and after a period of inactivity, the app will be shut down.

This is a perfect way for my RSS proxy to run, and it simplified a lot of stuff. I had written code to automatically deploy when I push to GitHub. Heroku already does that. I never took care of automating the installation of deps, but Heroku (or really miyagawa) did.

While I had certificates automatically getting created by LetsEncrypt, Heroku provides the same functionality and I will never need to baby-sit it.

And finally, because my RSS proxy is so light (accessed a few times a day) it ends up being free. Awesome. Thanks Heroku.

AWS Lambda

I originally tried using Lambda for this, but it required a rewrite and I am depending on some non-trivial infrastructural dependencies here. While I would have loved to port my application to Python and have it run for super cheap on AWS Lambda, it just was not a real option without more porting than I am prepared to do right now.

RSS and Git Repos

Tiny Tiny RSS is software that I very much have a love/hate relationship with. Due to the way the community works, I was always a little nervous about using it. After reading a blog post by Filippo Valsorda about Piwik I decided to try out Sandstorm.io on the Oasis. Sandstorm.io is a lot like Heroku, but it’s more geared toward hosting open source software for individuals, with a strong emphasis on security.

You know that friend you have who is a teacher and likes to blog about soccer? Do you really want that friend installing WordPress on a server? You do not. If that friend had an Oasis account, they could use the WordPress grain and almost certainly never get hacked.

I decided to try using Oasis to host my RSS reader and so far it has been very nice. I had one other friend using my original RSS instance (it was in multiuser mode) and he seems to have had no issues with using Oasis either. This is great; I now have a frustrating to maintain piece of software off of my server and also I’m not maintaining it for two. What a load off!

Oasis also has a grain for hosting a git repo, so I have migrated the storage of the source repo of this blog to the Oasis. That was a fairly painless process, but one thing to realize is that each grain is completely isolated, so when you set up a git repo grain it hosts just the one repo. If you have ten repos, you’d be using ten grains. That would be enough that you’d end up paying much more for your git repos.

I’ll probably move my Piwik hosting to the Oasis as well.

Oh also, it’s lightweight enough that it’s free! Thanks Oasis.

Lizard Brain and IRC Client

Lizard Brain is very much a tool that is glued into the guts of a Unix system. One of its core components is atd. As of today, Sandstorm has no scheduler that would allow LB to run there. Similarly, while Heroku does have a scheduler, its granularity is terrible and it’s much more like cron (it’s periodic) than atd (a specific event in time.) Amazon does have scheduled events for Lambda, but unlike Heroku and Sandstorm, that would require a complete rewrite in Python, Java, or JavaScript. I suspect I will rewrite in Python; it’s only about 800 lines, but it would be nice if I didn’t have to.

Another option would be for me to create my own atd, but then I’d have it running in a VM somewhere and if I have a VM running somewhere I have a lot less motivation to move every little service off of my current VM.

A much harder service is IRC. I use my VM as an IRC client so that I will always have logs of conversations that happened when I was away. Over time this has gotten less and less important, but there are still a few people who will reach out to me while I’m still asleep and I’m happy to respond when I’m back. As of today I do not see a good replacement for a full VM just for IRC. I may try to write some kind of thing to put SSH + Weechat in a grain to run on Sandstorm.io, but it seems like a lot of work.

An alternate option, which I do sortav like, is finding some IRC client that runs in the browser and also has an API, so I can use it from my phone, but also have a terminal interface.

The good news is that my Linode will eventually “expire” and I’ll probably get a T2 Nano EC2 instance, which costs about $2-4/month and is big enough (500 mB of RAM) to host an IRC Client. Even on my current Linode I’m using only 750 mB of ram and if you exclude MySQL (used for TTRSS, still haven’t uninstalled it) and SyncThing it’s suddenly less than 500 mB. Cool!

SyncThing

SyncThing is cool, but it’s not a critical enough part of my setup to require a VM. I am likely to just stop using it since I’ve gone all the way and gotten a paid account for DropBox.

Motivations

A lot of the above are specifics that are almost worthless to most of you. There are real reasons to move to a serverless setup, and I think they are reasons that everyone can appreciate.

Security

Software is consistently and constantly shown to be insecure. Engineers work hard to make good software, but it seems almost impossible for sufficiently complex software to be secure. I will admit that all of the services discussed here are also software, but because of their very structure the user is protected from a huge number of attacks.

Here’s a very simple example: on the Oasis, I have a MySQL instance inside of the TTRSS grain. On my Linode the MySQL server could potentially be misconfigured to be listening on a public interface, maybe because some PHP application installer did that. On the Oasis that’s not even possible, due to the routing of the containers.

Similarly, on Heroku, if there were some crazy kernel bug that needed to be resolved, because my application is getting spun down all the time, there are plenty of chances to reboot the underlying virtual machines without me even noticing.

Isolation

Isolation is a combination of a reliability and security feature. When it comes to security it means that if my blog were to get hacked, my TTRSS instance is completely unaffected. Now I have to admit this is a tiny bit of a straw man, because if I set up each of my services as separate users they’d be fairly well isolated. I didn’t do that though because that’s a hassle.

The reliability part of isolation is a lot more considerable though. If I tweak the Apache site config for TTRSS and run /etc/init.d/apache restart and had a syntax error, all of the sites being hosted on my machine go down till I fix the issue. While I’ve learned various ways to ensure that does not happen, “be careful” is a really stupid way to ensure reliability.

Cost

I make enough money to pay for a $20/mo Linode, but it just seems like a waste of overall money that could be put to better uses. Without a ton of effort I can cut my total spend in half, and I suspect drop to about %10. As mentioned already in the past, my blog is costing less than a dime a month and is rock-solid.

Problems

Nothing is perfect though. While I am mostly sold on the serverless phenomenon, there are some issues that I think need solving before it’s an unconditional win.

Storage (RDBMS etc)

This sorta blows my mind. With the exception of Sandstorm.io, which is meant for small amounts of users for a given application, no one really offers a cheap database. Heroku has a free database option that I couldn’t have used with my RSS reader, and the for-pay option would cost about half what I pay for my VM, just for the database.

Similarly AWS offers RDS, but that’s really just the cost of an EC2 VM, so at the barest that would be a consistent $2/mo. If you were willing to completely rewrite your application you might be able to get by using DyanomDB, but in my experience using it at work it can be very frustrating to try to tune for.

I really think that someone needs to come in and do what Lamdba did for code or DyanomDB did for KV stores, but for a traditional database. Basically as it stands today if you have a database that is idle, you pay the same price as you would for a database that is pegging it’s CPU. I want a traditional database that is billed based on usage.

Billing Models

Speaking of billing a database based on usage, more things need to be billed based on usage! I am a huge fan of most of the billing models on AWS, where you end up paying for what you use. For someone self hosting for personal consumption this almost always means that whatever you are doing will cost less than any server you could build. I would gladly pay for my Oasis usage, but a jump from free to $9 is just enough for me to instead change my behaviour and instead spend that money elsewhere.

If someone who works on Sandstorm.io is reading this and cares: I would gladly pay hourly per grain.

I have not yet used enough of Heroku to need to use the for pay option there, but it looks like I could probably use it fairly cheaply.

haters

Of course there will be some people who read this who think that running on anything but your own server is foolish. I wonder if those people run directly on the metal, or just assume that all of the Xen security bugs have been found. I wonder if those people regularly update their software for security patches and know to restart all of the various components that need to be restarted. I wonder if those people value their own time and money.


Hopefully before July I will only be using my server for IRC and Lizard Brain. There’s no rush to migrate since my Linode has almost 10 months before a rebill cycle. I do expect to test how well a T2 Nano works for my goals in the meantime though, so that I can easily pull the trigger when the time comes.

Posted Wed, Jun 1, 2016

Iterating over Chunks of a Diff in Vim

Every now and then at work I’ll make broad, sweeping changes in the codebase. The one I did recently was replacing all instances of print STDERR "foo\n" with warn "foo\n". There were about 160 instances in all that I changed. After discussing more with my boss, we discussed that instead of blindly replacing all those print statements with warns (which, for those who don’t know, are easier to intercept and log) we should just log to the right log level.

Enter Quickfix

Quickfix sounds like some kind of bad guy from a slasher movie to me, but it’s actualy a super handy feature in Vim. Here’s what the manual says:

Vim has a special mode to speedup the edit-compile-edit cycle. This is inspired by the quickfix option of the Manx’s Aztec C compiler on the Amiga. The idea is to save the error messages from the compiler in a file and use Vim to jump to the errors one by one. You can examine each problem and fix it, without having to remember all the error messages.

More concretely, the quickfix commands end up giving the user a list of locations. I tend to use the quickfix list most commonly with Fugitive. You can run the command :Ggrep foo and the quickfix list will contain all of the lines that git found containing foo. Then, to iterate over those locations you can use :cnext, :cprev, :cwindow, and many others, to interact with the list.

I have wanted a way to populate the quickfix list with the locations of all of the chunks that are in the current modified files for a long time, and this week I decided to finally do it.

First off, I wrote a little tool to parse diffs and output locations:

#!/usr/bin/env perl

use strict;
use warnings;

my $filename;
my $line;
my $offset = 0;
my $printed = 0;
while (<STDIN>) {
   if (m(^\+\+\+ b/(.*)$)) {
      $printed = 0;
      $filename = $1;
   } elsif (m(^@@ -\d+(?:,\d+)? \+(\d+))) {
      $line = $1;
      $offset = 0;
      $printed = 0;
   } elsif (m(^\+(.*)$)) {
      my $data = $1 || '-';
      print "$filename:" . ($offset + $line) . ":$data\n"
         unless $printed;
      $offset++;
      $printed = 1;
   } elsif (m(^ )) {
      $printed = 0;
      $offset++;
   }
}

The general usage is something like: git diff | diff-hunk-list, and the output will be something like:

app/lib/ZR/Plack/Middleware/AccessLog.pm:195:  local $SIG{USR1} = sub {
bin/zr-plack-reaper:22:-
bin/zr-plack-reaper:29:sub timeout { 120 }

The end result is a location for each new set of lines in a given diff. That means that deleted lines will not be included with this tool. Another tool or more options for this tool would have to be made for that functionality.

Then, I added the following to my vimrc:

command Gdiffs cexpr system('git diff \| diff-hunk-list')

So now I can simply run :Gdiffs and iterate over all of my changes, possibly tweaking them along the way!

Super Secret Bonus Content

The Quickfix is great, but there are a couple other things that I think really round out the functionality.

First: the Quickfix is global per session, so if you do :Gdiffs and then :Ggrep to refer to some other code, you’ve blown away the original quickfix list. There’s another list called the location list, which is scoped to a window. Also very useful; tends to use commands that start with l instead of c.

Second: There is another Tim Pope plugin called unimpaired which adds a ton of useful mappings; which includes [q and ]q to go back and forth in the quickfix, and [l and ]l to go back and forth in the location list. Please realize that the plugin does way more than just those two things, but I do use it for those the most.

Posted Wed, May 25, 2016

OSCON 2016

ZipRecruiter, where I work, generously pays for each engineer to go at least one conference a year. I have gone to YAPC every year since 2009 and would not skip it, except my wife is pregnant with our second child and will be due much too close to this year’s YAPC (or should I say instead: The Perl Conference?) for me to go.

There were a lot of conferences that I wanted to check out; PyCon, Monitorama, etc etc, but OSCON was the only one that I could seem to make work out with my schedule. I can only really compare OSCON to YAPC and to a lesser extent SCALE and the one time I went to the ExtJS conference (before it was called Sencha,) so my comparisons may be a little weird.

Something Corporate

OSCON is a super corporate conference, given that it’s name includes Open Source. For the most part this is fine; it means that there is a huge amount of swag (more on that later,) lots of networking to be done, and many free meals. On the other hand OSCON is crazy expensive; I would argue not worth the price. I got the lowest tier, since my wife didn’t want me to be gone for the full four days (and probably six including travel,) and it cost me a whopping twelve hundred dollars. Of course ZipRecruiter reimbursed me, but for those who are used to this, YAPC costs $200 max, normally.

On top of that there were what are called “sponsored talks.” I was unfamiliar with this concept but the basic idea is that a company can pay a lot of money and be guaranteed a slot, which is probably a keynote, to sortav shill their wares. I wouldn’t mind this if it weren’t for the fact that these talks, as far as I could tell, were universally bad. The one that stands out the most was from IBM, with this awesome line (paraphrased:)

Oh if you don’t use Swagger.io you’re not really an engineer. Maybe go back and read some more Knuth.

Swag

At YAPC you tend to get 1-3 shirts, some round tuits, and maybe some stickers. At OSCON I avoided shirts and ended up with six; I got a pair of socks, a metal bottle, a billion pretty awesome stickers, a coloring book, three stress toys, and a THOUSAND DOLLAR SKATEBOARD. To clarify, not everyone got the skateboard; the deal was that you had to get a Heroku account (get socks!) run a node app on your laptop (get shirt!) and then push it up to Heroku (get entered into drawing!) Most people gave up at step two because they had tablets or something, but I did it between talks because that all was super easy on my laptop. I actually was third in line after the drawing, but first and second never showed. Awesome!

The Hallway Track

For me the best part of any conference is what is lovingly called “the hallway track.” The idea is that the hallway, where socializing and networking happen, is equally important to all the other tracks (like DevOps, containers, or whatever.) I really enjoy YAPC’s hallway track, though a non-trivial reason is that I already have many friends in the Perl and (surprisingly distinct) YAPC world. On top of that YAPC tends to be in places that are very walkable, so it’s easy to go to a nice restaurant or bar with new friends.

I was pleasantly surprised by the OSCON hallway track. It was not as good as YAPC’s, but it was still pretty awesome. Here are my anecdotes:

Day 1 (Wed)

At lunch I hung out with Paul Fenwick and a few other people, which was pretty good. Chatting with Paul is always great and of course we ended up talking about ExoBrain and my silly little pseudoclone: Lizard Brain.

At dinner I decided to take a note from Fitz Elliot’s book, who once approached me after I did a talk and hung out with me a lot during the conference. I had a lot of good conversations with Fitz and I figured that maybe I could be half as cool as him and do the same thing. The last talk I went to was about machine learning and the speaker, Andy Kitchen, swerved into philosophy a few times, so I figured we’d have a good time and get along if I didn’t freak him out too much by asking if we could hang out. I was right, we (him, his partner Laura Summers, a couple other guys, and I) ended up going to a restaurant and just having a generally good time. It was pretty great.

Day 2 (Thu)

At lunch on Thursday I decided to sit at the Perl table and see who showed up. Randal Schwartz, who I often work with, was there, which was fun. A few other people were there. Todd Rinaldo springs to mind. I’ve spoken to him before, but this time we found an important common ground in trying to reduce memory footprints. I hope to collaborate with him to an extent and publish our results.

Dinner was pretty awesome. I considered doing the same thing I did on Wednesday, but I thought it’d be hugely weird to ask the girl who did the last talk I saw if she wanted to get dinner. That means something else, usually. So I went to the main area where people were sorta congregating and went to greet some of the Perl people that I recognized (Liz, Wendy, David H. Alder.) They were going to Max’s Wine Bar and ended up inviting me, and another girl who I sadly cannot remember the name of. Larry Wall (who invented Perl,) and his wife Gloria and one of his sons joined us, which was pretty fun. At the end of dinner (after I shared an amazing pair of deserts with Gloria) Larry and Wendy fought over who would pay the bill, and Larry won. This is always pretty humbling and fun. The punchline was that the girl who came with us didn’t know who Larry was, because she was mostly acquainted with Ruby. When Wendy told her there were many pictures taken. It was great.

Day 3 (Fri)

Most of Friday I tried to chill and recuperate. I basically slept, packed, went downtown to get lunch and coffee, and then waited for a cab to the airport. Then when I got to the airport I was noticed by another OSCON attendee (Julie Gunderson) because I was carrying the giant branded skateboard. She was hanging out with AJ Bowen and Jérôme Petazzoni, and they were cool with me tagging along with them to get a meal before we boarded the plane. It’s pretty cool that we were able to have a brief last hurrah after the conference was completely over.

Perl

One thing that I was pretty disappointed in was the general reaction when I mentioned that I use Perl. I have plenty of friends in Texas who think poorly of Perl, but I had assumed that was because they mostly worked on closed source software. The fact that a conference that was originally called The Perl Conference would end up encouraging such an anti-Perl attitude is very disheartening.

Don’t get me wrong, Perl is not perfect, but linguistic rivalries only alienate people. I would much rather you tell me some exciting thing you did with Ruby than say “ugh, why would someone build a startup on Perl?” I have a post in the queue about this, so I won’t say a lot more about this. If you happen to read this and are a hater, maybe don’t be a hater.


Overall the conference was a success for me. If I had to choose between a large conference like OSCON and a small conference like YAPC, I’d choose the latter. At some point I’d like to try out the crazy middle ground of something like DefCon where it’s grass roots but not corporate. Maybe in a few years!

Posted Fri, May 20, 2016

Faster DBI Profiling

Nearly two months ago I blogged about how to do profiling with DBI, which of course was about the same time we did this at work.

At the same time there was a non-trivial slowdown in some pages on the application. I spent some time trying to figure out why, but never made any real progress. On Monday of this week Aaron Hopkins pointed out that we had set $DBI::Profile::ON_DESTROY_DUMP to an empty code reference. If you take a peak at the code you’ll see that setting this to a coderef is much less efficient than it could be.

So the short solution is to set $DBI::Profile::ON_DESTROY_DUMP to false.

A better solution is to avoid the global entirely by making a simple subclass of DBI::Profile. Here’s how I did it:

package MyApp::DBI::Profile;

use strict;
use warnings;

use parent 'DBI::Profile';

sub DESTROY {}

1;

This correctly causes the destructor to do nothing, and allows us to avoid setting globals. If you are profiling all of your queries like we are, you really should do this.

Posted Wed, May 18, 2016

Setting up Let's Encrypt and Piwik

Late last week I decided that I wanted to set up Piwik on my blog. I’ll go into how to do that later in the post, but first I ran into a frustraing snag: I needed another TLS certificate. Normally I use StartSSL, because I’ve used them in the past, and I actually started to attempt to go down the path of getting another certificate through them this time, but I ran into technical difficulties that aren’t interesting enough to go into.

Let’s Encrypt

I decided to finally bite the bullet and switch to Let’s Encrypt. I’d looked into setting it up before but the default client was sorta heavyweight, needing a lot of dependencies installed and maybe more importantly it didn’t support Apache. On Twitter at some point I read about acmetool, a much more predictable tool with automated updating of certificates built in. Here’s how I set it up:

Install acmetool

I’m on Debian, but since it’s a static binary, as the acmetool documentation states, the Ubuntu repository also works:

sudo sh -c \
  "echo 'deb http://ppa.launchpad.net/hlandau/rhea/ubuntu xenial main' > \
      /etc/apt/sources.list.d/rhea.list"
sudo apt-key adv \
  --keyserver keyserver.ubuntu.com \
  --recv-keys 9862409EF124EC763B84972FF5AC9651EDB58DFA
sudo apt-get update
sudo apt-get install acmetool

Configure acmetool

First I ran sudo acmetool quickstart. My answers were:

  • 1, to use the Live Let’s Encrypt servers
  • 2, to use the PROXY challenge requests

And I think it asked to install a cronjob, which I said yes to.

Get some certs

This is assuming you have your DNS configured so that your hostname resolves to your IP address. Once that’s the case you should simply be able to run this command to get some certs:

sudo acmetool want \
  piwik.afoolishmanifesto.com \
     st.afoolishmanifesto.com \
    rss.afoolishmanifesto.com

Configure Apache with the certs

There were a couple little things I had to do to get multiple certificates (SNI) working on my server. First off, /etc/apache2/ports.conf needs to look like this:

NameVirtualHost *:443
Listen 443

Note that my server is TLS only; if you support unencrypted connections obviously the above will be different.

Next, edit each site that you are enabling. So for example, my /etc/apache2/sites-availabe/piwik looks like this:

<VirtualHost *:443>
        ServerName piwik.afoolishmanifesto.com
        ServerAdmin webmaster@localhost

        SSLEngine on
        SSLCertificateFile      /var/lib/acme/live/piwik.afoolishmanifesto.com/cert
        SSLCertificateKeyFile   /var/lib/acme/live/piwik.afoolishmanifesto.com/privkey
        SSLCertificateChainFile /var/lib/acme/live/piwik.afoolishmanifesto.com/chain

        ProxyPass "/.well-known/acme-challenge" "http://127.0.0.1:402/.well-known/acme-challenge"
        DocumentRoot /var/www/piwik
        <Location />
                Order allow,deny
                allow from all
        </Location>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        LogLevel warn

        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

I really like that the certificate files end up in a place that is predictable and clear.

After doing the above configuration, you should be able to restart apache (sudo /etc/init.d/apache2 restart), access your website, and see it using a freshly minted Let’s Encrypt certificate.

Configure auto-renewal

Let’s Encrypt certificates do not last very long at all. Normally a cheap or free certificate will last a year, a more expensive one will last two years, and some special expensive EV certs can last longer, with I think a normal max of five? The Let’s Encrypt ones last ninety days. With an expiration so often, automation is a must. This is where acmetool really shines. If you allowed it to install a cronjob it will periodically renew certificates. That’s all well and good but your server needs to be informed that a new certificate has been installed. The simplest way to do this is to edit the /etc/default/acme-reload file and set SERVICES to apache2.

Piwik

The initiator of all of the above was to set up Piwik. If you haven’t heard of Piwik, it’s basically a locally hosted Google Analytics. The main benefit being that people who use various ad-blockers and privacy tools will not be blocking you, and reasonably so as your analytics will not leave your server.

The install was fairly straight forward. The main thing I did was follow the instructions here and then when it came to the MySQL step I ran the following commands as the mysql root user (mysql -u root -p):

CREATE DATABASE piwik;
CREATE USER 'piwik'@'localhost' IDENTIFIED BY 'somepassword';
use piwik;
GRANT ALL PRIVILEGES ON *.* TO 'piwik'@'localhost';

So now that I have Piwik I can see interesting information much more easily than before, where I wrote my own little tools to parse access logs. Pretty neat!

Posted Sat, May 14, 2016

Rage Inducing Bugs

I have run into a lot of bugs lately. Maybe it’s actually a normal amount, but these bugs, especially taken together, have caused me quite a bit of rage. Writing is an outlet for me and at the very least you can all enjoy the show, so here goes!

X11 Text Thing

I tweeted about this one a few days ago. The gist is that, sometimes, when going back and forth from suspend, font data in video memory gets corrupted. I have a theory that it has to do with switching from X11 to DRI (the old school TTYs), but it is not the most reproducable thing in the world, so this is where I’ve had to leave it.

Firefox

I reported a bug against Firefox recently about overlay windows not getting shown. There is a workaround for this, and the Firefox team (or at least a Firefox individual, Karl Tomlinson) has been willing to look at my errors and have a dialog with me. I have a sinking feeling that this could be a kernel driver bug or maybe a GTK3 bug, but I have no idea how to verify that.

Vim SIGBUS

Vim has been crashing on my computer for a while now. I turned on coredumps for gvim only so that I could repro it about three weeks ago, and I finally got the coveted core yesterday. I dutifully inspected the core dump with gdb and got this:

Program terminated with signal SIGBUS, Bus error.
#0  0x00007fa650515757 in ?? ()

Worthless. I knew I needed debugging symbols, but it turns out there is no vim-dbg. A while ago (like, eight years ago) Debian (and thus Ubuntu) started storing debugging symbols in a completely separate repository. Thankfully a Debian developer, Niels Thykier, was kind enough to point this out to me that I was able to install the debugging symbols. If you want to do that yourself you can follow instructions here, but I have to warn you, you will get errors, because I don’t think Ubuntu has really put much effort into this working well.

After installing the debugging symbols I got this much more useful backtrace:

#0  0x00007fa650515757 in kill () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000555fad98c273 in may_core_dump () at os_unix.c:3297
#2  0x0000555fad98dd20 in may_core_dump () at os_unix.c:3266
#3  mch_exit (r=1) at os_unix.c:3263
#4  <signal handler called>
#5  in_id_list (cur_si=<optimized out>, cur_si@entry=0x555fb0591700, list=0x6578655f3931313e, 
    ssp=ssp@entry=0x555faf7497a0, contained=0) at syntax.c:6193
#6  0x0000555fad9fb902 in syn_current_attr (syncing=syncing@entry=0, displaying=displaying@entry=0, 
    can_spell=can_spell@entry=0x0, keep_state=keep_state@entry=0) at syntax.c:2090
#7  0x0000555fad9fc1b4 in syn_finish_line (syncing=syncing@entry=0) at syntax.c:1781
#8  0x0000555fad9fcd3f in syn_finish_line (syncing=0) at syntax.c:758
#9  syntax_start (wp=0x555faf633720, lnum=3250) at syntax.c:536
#10 0x0000555fad9fcf45 in syn_get_foldlevel (wp=0x555faf633720, lnum=lnum@entry=3250) at syntax.c:6546
#11 0x0000555fad9167e9 in foldlevelSyntax (flp=0x7ffe2b90beb0) at fold.c:3222
#12 0x0000555fad917fe8 in foldUpdateIEMSRecurse (gap=gap@entry=0x555faf633828, level=level@entry=1, 
    startlnum=startlnum@entry=1, flp=flp@entry=0x7ffe2b90beb0, 
    getlevel=getlevel@entry=0x555fad9167a0 <foldlevelSyntax>, bot=bot@entry=7532, topflags=2)
    at fold.c:2652
#13 0x0000555fad918dbf in foldUpdateIEMS (bot=7532, top=1, wp=0x555faf633720) at fold.c:2292
#14 foldUpdate (wp=wp@entry=0x555faf633720, top=top@entry=1, bot=bot@entry=2147483647) at fold.c:835
#15 0x0000555fad919123 in checkupdate (wp=wp@entry=0x555faf633720) at fold.c:1187
#16 0x0000555fad91936a in checkupdate (wp=0x555faf633720) at fold.c:217
#17 hasFoldingWin (win=0x555faf633720, lnum=5591, firstp=0x555faf633798, lastp=lastp@entry=0x0, 
    cache=cache@entry=1, infop=infop@entry=0x0) at fold.c:158
#18 0x0000555fad91942e in hasFolding (lnum=<optimized out>, firstp=<optimized out>, 
    lastp=lastp@entry=0x0) at fold.c:133
#19 0x0000555fad959c3e in update_topline () at move.c:291
#20 0x0000555fad9118ee in buf_reload (buf=buf@entry=0x555faf25e210, orig_mode=orig_mode@entry=33204)
    at fileio.c:7155
#21 0x0000555fad911d0c in buf_check_timestamp (buf=buf@entry=0x555faf25e210, focus=focus@entry=1)
    at fileio.c:6997
#22 0x0000555fad912422 in check_timestamps (focus=1) at fileio.c:6664
#23 0x0000555fada1091b in ui_focus_change (in_focus=<optimized out>) at ui.c:3203
#24 0x0000555fad91fd96 in vgetc () at getchar.c:1670
#25 0x0000555fad920019 in safe_vgetc () at getchar.c:1801
#26 0x0000555fad96e775 in normal_cmd (oap=0x7ffe2b90c440, toplevel=1) at normal.c:627
#27 0x0000555fada5d665 in main_loop (cmdwin=0, noexmode=0) at main.c:1359
#28 0x0000555fad88d21d in main (argc=<optimized out>, argv=<optimized out>) at main.c:1051

I am already part of the vim mailing list, so I sent an email and see responses (though sadly not CC’d to me) as I write this post, so hopefully this will be resolved soon.

Linux Kernel Bugs

I found a bug in the Linux Kernel, probably related to the nvidia drivers, but I’m not totally sure. I’d love for this to get resolved, though reporting kernel bugs to Ubuntu has not gone well for me in the past.

Vim sessions

The kernel bug above causes the computer to crash during xrandr events; this means that I end up with vim writing a fresh new session file during the event (thanks to the excellent Obsession by Tim Pope) and the session file getting hopelessly corrupted, because it fails midwrite.

I foolishly mentioned this on the #vim channel on freenode and was reminded how often IRC channels are actually unrelated to aptitude. The people in the channel seemed to think that if the kernel crashes, there is nothing that can be done by a program to avoid losing data. I will argue that while it is hard, it is not impossible. The most basic thing that can and should be done is:

  1. Write to a tempfile
  2. Rename the tempfile to the final file

This should be atomic and safe. There are many ways that dealing with files can go wrong, but to believe it is impossible to protect against them is unimpressive, to say the least.

I will likely submit the above as a proper bug to the Vim team tomorrow. In the meantime this must also be done in Obsession, and I have submitted a small patch to do what I outlined above. I’m battle testing it now and will know soon if it resolves the problem.


I feel better. At the very least I’ve submitted bugs, and in one of the most annoying cases, been able to submit a patch. When you run into a bug, why not do the maintainer a solid and report it? And if you can, fix it!

Posted Tue, May 10, 2016

Putting MySQL in Timeout

At work we are working hard to scale our service to serve more users and have fewer outages. Exciting times!

One of the main problems we’ve had since I arrived is that MySQL 5.6 doesn’t really support query timeouts. It has stall timeouts, but if a query takes too long there’s not a great way to cancel it. I worked on resolving this a few months ago and was disapointed that I couldn’t seem to come up with a good solution that was simple enough to not scare me.

A couple weeks ago we hired a new architect (Aaron Hopkins) and he, along with some ideas from my boss, Bill Hamlin, came up with a pretty elegant and simple way to tackle this.

The solution is in two parts, the client side, and a reaper. On the client you simply set a stall timeout; this example is Perl but any MySQL driver should expose these connection options:

my $dbh = DBI->connect('dbd:mysql:...', 'zr', $password, {
   mysql_read_timeout  => 2 * 60,
   mysql_write_timeout => 2 * 60,
   ...,
})

This will at the very least cause the client to stop waiting if the database disappears. If the client is doing a query and pulling rows down over the course of 10 minutes, but is getting a new row every 30s, this will not help.

To resolve the above problem, we have a simple reaper script:

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use JSON;
use Linux::Proc::Net::TCP;
use Sys::Hostname;

my $actual_host = hostname();

my $max_timeout = 2 * 24 * 60 * 60;
$max_timeout = 2 * 60 * 60 if $actual_host eq 'db-master';

my $dbh = DBI->connect(
  'dbi:mysql:host=localhost',
  'root',
  $ENV{MYSQL_PWD},
  {
    RaiseError => 1
    mysql_read_timeout => 30,
    mysql_write_timeout => 30,
  },
);

my $sql = <<'SQL';
SELECT pl.id, pl.host, pl.time, pl.info
  FROM information_schema.processlist pl
 WHERE pl.command NOT IN ('Sleep', 'Binlog Dump') AND
       pl.user NOT IN ('root', 'system user') AND
       pl.time >= 2 * 60
SQL

while (1) {
  my $sth = $dbh->prepare_cached($sql);
  $sth->execute;

  my $connections;

  while (my $row = $sth->fetchrow_hashref) {
    kill_query($row, 'max-timeout') if $row->{time} >= $max_timeout;

    if (my ($json) = ($row->{info} =~ m/ZR_META:\s+(.*)$/)) {
      my $data = decode_json($json);

      kill_query($row, 'web-timeout') if $data->{catalyst_app};
    }

    $connections ||= live_connections();
    kill_query($row, 'zombie') unless $connections->{$row->{host}}
  }

  sleep 1;
}

sub kill_query {
  my ($row, $reason) = @_;
  no warnings 'exiting';

  warn sprintf "killing «%s», reason %s\n", $row->{info}, $reason;
  $dbh->do("KILL CONNECTION ?", undef, $row->{id}) unless $opt->noaction;
  next;
}

sub live_connections {
  my $table = Linux::Proc::Net::TCP->read;

  return +{
    map { $_->rem_address . ':' . $_->local_port => 1 }
    grep $_->st eq 'ESTABLISHED',
    @$table
  }
}

There are a lot of subtle details in the above script; so I’ll do a little bit of exposition. First off, the reaper runs directly on the database server. We define the absolute maximum timeout based on the hostname of the machine, with 2 days being the timeout for reporting and read-only minions, and 2 hours being the timeout for the master.

The SQL query grabs all running tasks, but ignores a certain set of tasks. Importantly, we have to whitelist a couple users because one (root) is where extremely long running DDL takes place and the other (system user) is doing replication, basically constantly.

We iterate over the returned queries, immediately killing those that took longer than the maximum timeout. Any queries that our ORM (DBIx::Class) generated have a little bit of logging appended as a comment with JSON in it. We can use that to tweak the timeout further; initially by choking down web requests to a shorter timeout, and later we’ll likely allow users to set a custom timeout directly in that comment.

Finally, we kill queries whose client has given up the ghost. I did a test a while ago where I started a query and then killed the script doing the query, and I could see that MySQL kept running the query. I can only assume that this is because it could have been some kind of long running UPDATE or something. I expect the timeouts will be the main cause of query reaping, but this is a nice stopgap that could pare down some pointless crashed queries.

I am very pleased with this solution. I even think that if we eventually switch to Aurora all except the zombie checking will continue to work.

Posted Sun, May 8, 2016

A new Join Prune in DBIx::Class

At work a coworker and I recently went on a rampage cleaning up our git branches. Part of that means I need to clean up my own small pile of unmerged work. One of those branches is an unmerged change to our subclass of the DBIx::Class Storage Layer to add a new kind of join prune.

If you didn’t know, good databases can avoid doing joins at all by looking at the query and seeing where (or if) the joined in table was used at all. DBIx::Class does the same thing, for databases that do not have such tooling built in. In fact there was a time when it could prune certain kinds of joins that even the lauded PostgreSQL could not. That may no longer be the case though.

The rest of what follows in this blog post is a very slightly tidied up commit message of the original branch. Enjoy!


Recently Craig Glendenning found a query in the ZR codebase that was using significant resources; the main problem was that it included a relationship but didn’t need to. We fixed the query, but I was confused because DBIx::Class has a built in join pruner and I expected it to have transparently solved this issue.

It turns out we found a new case where the join pruner can apply!

If you have a query that matches all of the following conditions:

  • a relationship is joined with a LEFT JOIN
  • that relationship is not in the WHERE
  • that relationship is not in the SELECT
  • the query is limited to one row

You can remove the matching relationship. The WHERE and SELECT conditions should be obvious: if a relationship is used in the WHERE clause, you need it to be joined for the WHERE clause to be able to match against the column. Similarly, for the SELECT clause the relationship must be included so that the column can actually be referenced in the SELECT clause.

The one row and LEFT JOIN conditions are more subtle; but basically consider this case:

You have a query with a limit of 2 and you join in a relationship that has zero or more related rows. If you get back zero rows for all of the relationships, the root table will basically be returned and you’ll just get the first two rows from that table. But consider if you got back two related rows for each row in the root table: you would only get back the first row from the root table.

Similarly, the reason that LEFT is specified is that if it were a standard INNER JOIN, the relationship will filter the root table based on relationship.

If you specify a single row, when a relationship is LEFT it is not filtering the root table, and the “exploding” nature of relationships does not apply, so you will always get the same row.


I’ve pushed the change that adds the new join prune to GitHub, and notified the current maintainer of DBIx::Class in the hopes that it can get merged in for everyone to enjoy.

Posted Fri, Apr 29, 2016

Python: Taking the Good with the Bad

For the past few months I’ve been working on a side project using Python. I’ll post about that project some other time, but now that I’ve used Python a little bit I think I can more reasonably consider it (so not just “meaningful whitespace?!?“)

It’s much too easy to write a bunch of stuff that is merely justification of the status quo (in my case that is the use of Perl.) I’m making an effort to consider all of the good things about Python and only mentioning Perl when there is a lack. I’d rather not compare them at all, but I don’t see a way around that without silly mental trickery.

Note that this is about Python 2. If you want to discuss Python 3, let’s compare it to Perl 6.

Generally awesome stuff about Python

The following are my main reasons for liking Python. They are in order of importance, and some have caveats.

Generators

Generators (also known as continuations) are an awesome linguistic feature. It took me a long time to understand why they are useful, but I think I can summarize it easily now:

What if you wanted to have a function with an infinite loop in the middle?

In Perl, the typical answer might be to build an iterator. This is fine, but it can be a lot of work. In Python, you just use normal code, and a special keyword, yield. For simple stuff, the closures you have available to you in Perl will likely seem less magic. But for complicated things, like iterating over the nodes in a tree, Python will almost surely be easier.

Let me be clear: in my mind, generators are an incredibly important feature and that Perl lacks them is significant and terrible. There are efforts to get them into core, and there is a library that implements them, but it is not supported on the newest versions of Perl.

Builtins

Structured data is one of the most important parts of programming. Arrays are super important; I think that’s obvious. Hashes are, in my opinion, equally useful. There are a lot of other types of collections that could be considered after the point of diminishing returns once hashes are well within reach, but a few are included in Python and I think that’s a good thing. To clarify, in Python, one could write:

cats = set(['Dantes', 'Sunny Day', 'Wheelbarrow'])
tools = set(['Hammer', 'Screwdriver', 'Wheelbarrow'])

print cats.intersection(tools)

In Perl that can be done with a hash, but it’s a hassle, so I tend to use Set::Scalar.

Python also ships with an OrderedDict, which is like Perl’s Tie::IxHash. But Tie::IxHash is sorta aging and weird and what’s with that name?

A Python programmer might also mention that the DefaultDict is cool. I’d argue that the DefaultDict merely works around Python’s insistence that the programmer be explicit about a great many things. That is: it is a workaround for Pythonic dogma.

Rarely need a compiler for packages

In my experience, only very rarely do libraries need to be compiled in Python. So oviously math intensive stuff like crypto or high precision stuff will need a compiler, but the vast majority of other things do not. I think part of the reason for this is that Python ships with an FFI library (ctypes). So awesome.

In Perl, even the popular OO framework Moose requires a compiler!

“protocols”

If you want to define your own weird kind of dictionary in Python, it’s really easy: you subclass dict and define around ten methods. It will all just work. This applies to all of Python’s builtins, I believe.

In Perl, you have to use tie, which is similar but you can end up with oddities related to Perl’s weird indirect method syntax. Basically, often things like print $fhobject $str will not work as expected. Sad camel.

Interactive Python Shell

Python ships with an excellent interactive shell, which can be used by simply running python. It has line editing, history, builtin help, and lots of other handy tools for testing out little bits of code. I have lots of little tools to work around the lack of a good interactive shell in Perl. This is super handy.

Simple Syntax

The syntax of Python can be learned by a seasoned programmer in an afternoon. Awesome.

Cool, weird projects

I’ll happily accept more examples for this. A few spring to mind:

  1. BCC is sorta like a DTrace but for Linux.
  2. PyNES lets you run NES games written in Python.
  3. BITS is a Python based operating system, for doing weird hardware stuff without having to write C.

Batteries Included

Python ships with a lot of libraries, like the builtins above, that are not quite so generic. Some examples that I’ve used include a netrc parser, an IMAP client, some email parsing tools, and some stuff for building and working with iterators. The awesome thing is that I’ve written some fairly handy tools that in Perl would have certainly required me to reach for CPAN modules.

What’s not so awesome is that the libraries are clearly not of the high quality one would desire. Here are two examples:

First, the core netrc library can only select by host, instead of host and account. This was causing a bug for me when using OfflineIMAP. I rolled up my sleeves, cloned cpython, fixed the bug, and then found that it had been reported, with a patch, five years ago. Not cool.

Second, the builtin email libraries are pretty weak. To get the content of a header I had to use the following code:

import email.header
import re

decoded_header = str(email.header.make_header(email.header.decode_header(header)))
unfolded_header = re.sub('[\r\n]', '', decoded_header)

I’m not impressed.

There are more examples, but this should be sufficient.

Now before you jump on me as a Perl programmer: Perl definintely has some weak spots in it’s included libraries, but unlike with Python, the vast majority of those are actually on CPAN and can be updated without updating Perl. Unless I am missing something, that is not the case with the Python core libraries.

Prescriptive

The Python community as a whole, or at least my interaction with it, seems to be fairly intent on defining the one-and-true way to do anything. This is great for new programmers, but I find it condescending and unhelpful. I like to say that the following are all the programmer’s creed (stolen from various media):

That which compiles is true.

Nothing is True and Everything is Permissible

“Considered Harmful” Considered Harmful

Generally not awesome stuff about Python

As before, these are things that bother me about Python, in order.

Variable Scope and Declaration

Python seems to aim to be a boring but useful programming language. Like Java, but a scripting language. This is a laudable goal and I think Go is the newest in this tradition. Why would a language that intends to be boring have any scoping rules that are not strictly and exclusively lexical? If you know, tell me.

In Perl, the following code would not even compile:

use strict;

sub print_x { print("$x\n") }
print_x();
my $x = 1;
print_x();

In Python, it does what a crazy person would expect:

def foo():
   print(x)

foo()
x = 1
foo()

The real problem here is that in Python, variables are never declared. It is not an error to set x = 1 in Python, how else would you create the variable? In Perl, you can define a variable as lexical with my, global with our, and dynamic with local. Python is a sad mixture of lexical and global. The fact that anyone would ever need to explain scoping implies that it’s pretty weird.

PyPI and (the lack of) friends

I would argue that since the early 2000’s, a critical part of a language is its ecosystem. A language that has no libraries is lonely, dreary work. Python has plenty of libraries, but the actual web presence of the ecosystem is crazily fractured. Here are some things that both search.cpan.org and MetaCPAN do that PyPI does not:

  • Include and render all of the documentation for all modules (example)
  • Include a web accessible version of all (or almost all) releases of the code (example, example)

And MetaCPAN does a ton more; here are features I often use:

And there’s a constellation of other tools; here are my favorites:

  • CPANTesters aggregates the test results of individuals and smoke machines of huge amounts of CPAN on a ton of operating systems. Does your module run on Solaris?
  • rt.cpan.org is a powerful issue tracker that creates a queue of issues for every module released on CPAN. Nowadays with Github that’s not as important as it used to be, but even with Github, RT still allows you to create issues without needing to login.

Documentation

This is related to my first complaint about PyPI above. When I install software on my computer, I want to read the docs that are local to the installed version. There are two reasons for this:

  1. I don’t want to accidentally read docs for a different version than what is installed.
  2. I want to be able to read documentation when the internet is out.

Because the documentation of Python packages is so free form, people end up hosting their docs on random websites. That’s fine, I guess, but people end up not including the documentation in the installed module. For example, if you install boltons, you’ll note that while you can run pydoc boltons, there is no way to see this page via pydoc. Pretty frustrating.

On top of that, the documentation by convention is reStructuredText. rst is fine, as a format. It’s like markdown or POD (Perl’s documentation format) or whatever. But there are (at least) two very frustrating issues with it:

  1. There is no general linking format. In Perl, if I do L<DBIx::Class::Helpers>, it will link to the doc for that module. Because of the free form documentation in Python, this is impossible.
  2. It doesn’t render at all with pydoc; you just end up seeing all the noisy syntax.

And it gets worse! There is documentation for core Python that is stored on a wiki! A good example is the page about the time complexity of various builtins. There is no good reason for this documentation to not be bundled with the actual Python release.

matt’s script archive

As much as the prescriptivism of Python exists to encourage the community to write things in a similar style; a ton of old code still exists that is just as crappy as all the old Perl code out there.

I love examples, and I have a good one for this. My little Python project involves parsing RSS (and Atom) feeds. I asked around and was pointed at feedparser. It’s got a lot of shortcomings. The one that comes to mind is, if you want to parse feeds without sanitizing the included html, you have to mutate a global. Worse, this is only documented in a comment in the source code.

Unicode

Python has this frustrating behaviour when it comes to printing Unicode. Basically if the programmer is printing Unicode (the string is not bytes, but meaningful characters) to a console, Python assumes that it can encode as UTF8. If it’s printing to anything else it defaults to ASCII and will often throw an exception. This means you might have some code that works perfectly well when you are testing it Interactively, and when it happens to print just ASCII when redirected to a file, but when characters outside of ASCII show up it throws an exception. (Try it and see: python -c 'print(u"\u0420")' | cat) (Read more here.)

It’s also somewhat frustrating that the Python wiki complains that Python predates Unicode and thus cannot be expected to support it, while Perl predates even Python, but has excellent support for Unicode built into Perl 5 (the equivalent of Python 2.x.) A solid example that I can think of is that while Python encourages users to be aware of Unicode, it does not give users a way to compare strings ignoring case. Here’s an example of where that matters; if we are ignoring case, “ß” should be equal to “ss”. In Perl you can verify this by running: perl -Mutf8 -E'say "equal" if fc "ß" eq fc "ss"'. In Python one must download a package from PyPI which is documented as an order of magnitude slower than the core version from Python 3.

SIGPIPE

In Unix there is this signal, SIGPIPE, that gets sent to a process when the pipe it is writing to gets closed. This can be a simple efficiency improvement, but even ignoring efficiency, it will get used. Imagine you have code that reads from a database, then prints a line, then reads, etc. If you wanted the first 10 rows, you could pipe to head -n10 and both truncate after the 10th line and kill the program. In Python, this causes an exception to be thrown, so users of Python programs who know and love Unix will either be annoyed that they see a stack trace, or submit annoying patches to globally ignore SIGPIPE in your program.


Overall, I think Python is a pretty great language to have available. I still write Perl most of the time, but knowing Python has definitely been helpful. Another time I’ll write a post about being a polyglot in general.

Posted Thu, Apr 21, 2016

Humane Interfaces

In this post I just want to briefly discuss and demonstrate a humane user interface that I invented at work.

At ZipRecruiter, where I work, we use a third party system called Bonus.ly. Each employee is given $20 in the form of 100 Zip Points at the beginning of each month. These points can be given to any other employee for any reason, and then redeemed for gift cards basically anywhere (Amazon, Starbucks, Steam, REI, and even as cash with Paypal, just to name a few.)

Of course the vast majority of users give bonusly by using the web interface, where you pick a user with an autocompleter, you select the amount with a dropdown, and you type the reason and hashtag (you must include a hashtag) in a textfield. This is fine for most users, but I hate the browser because it’s so sluggish and bloated. The other option is to use the built in Slack interface. I used that for a long time; it works like this: /give +3 to @sara for Helping me with my UI #teamwork

This is pretty good but there is one major problem: the username above is based on the local part of an email address, even though when it comes to Slack using @foo looks a lot like a Slack username. I kept accidentally giving bonusly to the wrong Sara!

Bonusly has a pretty great API and one of my coworkers released an inteface on CPAN. I used this API to write a small CLI script. The actual script is not that important (but if you are interested let me know and I’ll happily publish it.) What’s cool is the interface. First off here is the argument parsing:

my ($amount, $user, $reason);

for (@ARGV) {
  if (m/^\d+$/) {
    $amount = $_;
  } elsif (!m/#/) {
    $user = $_;
  } else {
    $reason = $_;
  }
}

die "no user provided!\n"   unless $user;
die "no amount provided!\n" unless $amount;
die "no reason provided!\n" unless $reason;

The above parses an amount, a user, and a reason for the bonus. The amount must be a positive integer, and the reason must include a hashtag. Because of this, we can ignore the ordering. This solves an unstated annoyance with the Slack integration of Bonusly; I do not have to remember the ordering of the arguments, I just type what makes sense!

Next up, the user validation, which resolves the main problem:

# The following just makes an array of users like:
# Frew Schmidt <frew@ziprecruiter.com>

my @users =
  grep _render_user($_) =~ m/$user/i,
  @{_b->users->list->{result}};


if (@users > 1) {
  warn "Too many users found!\n";
  warn ' * ' . _render_user($_) . "\n" for @users;
  exit 1;
} elsif (@users < 1) {
  warn "No users match! Could $user be a contractor?\n";
  exit 2;

The above will keep from accidentally selecting one of many users by prompting the person running the script for a more specific match.

Of course the above UI is not perfect for every user. But I am still very pleased to have unordered positional arguments. I hope this inspires you to reduce requirements on your users when they are using your software.

Posted Sat, Apr 9, 2016

CloudFront Migration Update

When I migrated my blog to CloudFront I mentioned that I’d post about how it is going in late March. Well it’s late March now so here goes!

First off, I switched from using the awscli tools and am using s3cmd because it does the smart thing and only syncs if the md5 checksum is different. Not only does this make a sync significantly faster, it also reduces PUTs which are a major part of the cost of this endeavour.

Speaking of costs, how much is this costing me? February, which was a partial month, cost a total of $0.03. One might expect March to cost more than four times that amount (still couch change) but because of the s3cmd change I made, the total cost in March so far is $0.04, with a forecast of $0.05. There is one cost that I failed to factor in: logging.

While my full blog is a svelte 36M, just the logs for CloudFront over the past 36 days has been almost double that; and they are compressed with gzip! The logging incurs additional PUTs to S3 as well as an additional storage burden. The free tier includes 5G of free storage, but pulling down the log files as structured (a file per region per hour gzipped) is a big hassle. I had over five thousand log files to download, and it took about an hour. I’m not sure how I’ll deal with it in the future but I may periodically pull down those logs, consolidate them, and replace them with a rolled up month at a time file.

Because the logs were slightly easier to interact with than before I figured I’d pull them down and take a look. I had to write a little Perl script to parse and merge the logs. Here’s that, for the interested:

#!/usr/bin/env perl

use 5.20.0;
use warnings;

use autodie;

use Text::CSV;

my $glob = shift;
my @values = @ARGV;
my @filelisting = glob($glob);

for my $filename (@filelisting) {
  open my $fh, '<:gzip', $filename;
  my $csv = Text::CSV->new({ sep_char => "\t" });
  $csv->column_names([qw(
      date time x_edge_location sc_bytes c_ip method host cs_uri_stem sc_status
      referer user_agent uri_query cookie x_edge_result_type x_edge_request_id
      x_host_header cs_protocol cs_bytes time_taken x_forwarded_for ssl_protocol
      ssl_cipher x_edge_response_result_type
  )]);
  # skip headers
  $csv->getline($fh) for 1..2;
  while (my $row = $csv->getline_hr($fh)) {
    say join "\t", map $row->{$_}, @values
  }
}

To get all of the accessed URLs, with counts, I ran the following oneliner:

perl read.pl '*.2016-03-*.gz' cs_uri_stem | sort | uniq -c | sort -n

There are some really odd requests here, along with some sorta frustrating issues. Here are the top thirty, with counts:

  27050 /feed
  24353 /wp-content/uploads/2007/08/transform.png
  13723 /feed/
   8044 /static/img/me200.gif
   5011 /index.xml
   4607 /favicon.ico
   3866 /
   2491 /static/css/styles.css
   2476 /static/css/bootstrap.min.css
   2473 /static/css/fonts.css
   2389 /static/js/bootstrap.min.js
   2384 /static/js/jquery.js
   2373 /robots.txt
    966 /posts/install-and-configure-the-ms-odbc-driver-on-debian/
    637 /wp-content//uploads//2007//08//transform.png
    476 /archives/1352
    311 /wp-content/uploads/2007/08/readingminds2.png
    278 /keybase.txt
    266 /posts/replacing-your-cyanogenmod-kernel-for-fun-and-profit/
    225 /archives/1352/
    197 /feed/atom/
    191 /static/img/pong.p8.png
    166 /posts/concurrency-and-async-in-perl/
    155 /n/a
    149 /posts/weirdest-interview-so-far/
    144 /apple-touch-icon.png
    140 /apple-touch-icon-precomposed.png
    133 /posts/dbi-logging-and-profiling/
    126 /posts/a-gentle-tls-intro-for-perlers/
    120 /feed/atom

What follows is pretty intense navel gazing that I suspect very few people care about. I think it’s interesting but that’s because like most people I am somewhat of a narcissist. Feel free to skip it.

So /feed, /feed/, /feed/atom, and /feed/atom/ are in this list a lot, and sadly when I migrated to CloudFront I failed to set up the redirect header. I’ll be figuring that out soon if possible.

/, /favicon.ico, and /index.xml are all normal and expected. It really surprises me how many things are accessing / directly. A bunch of it is people, but a lot is feed readers. Why they would hit / is beyond me.

/wp-content/uploads/2007/08/transform.png and /wp-content//uploads//2007//08//transform.png (from this page) seems to be legitimately popular. It is bizarrely being accessed from a huge variety of User Agents. At the advice of a friend I looked more closely and it turns out it’s being hotlinked by a Vietnamese social media site or something. This is cheap enough that I don’t care enough to do anything about it.

/wp-content/uploads/2007/08/readingminds2.png is similar to the above.

/static/img/me200.gif is an avatar that I use on a few sites. Not super surprising, but as always: astounded at the number.

/robots.txt Is being accessed a lot, presumably by all the various feed readers. It might be worthwhile to actually create that file. No clue.

/static/css/* and /static/js/* should be pretty obvious. I would consider using those from a CDN but my blog is already on a CDN so what’s the point! But it might be worth at least adding some headers so those are cached by browsers more aggressively.

/posts/install-and-configure-the-ms-odbc-driver-on-debian/ (link) is apparently my most popular post, and I would argue that that is legitimate. I should automate some kind of verification that it continues to work. I try to keep it updated but it’s hard now that I’ve stopped using SQL Server myself.

/archives/1352 and /archives/1352/ is pre-hugo URL URL for the announcement of DBIx::Class::DeploymentHandler. I’m not sure why the old URL is being linked to, but I am glad I put all that effort into ensuring that old links keep working.

/keybase.txt is the identity proof for Keybase (which I have never used by the way.) It must check every four hours or something.

/posts/replacing-your-cyanogenmod-kernel-for-fun-and-profit/ (link) is a weird post of mine, but I’m glad that a lot of people are interested, because it was a lot of work to do.

/static/img/pong.p8.png, /posts/weirdest-interview-so-far/ (link), and /posts/dbi-logging-and-profiling/ (link) were all on / at some point in the month so surely people just clicked those from there.

/posts/concurrency-and-async-in-perl/ (link) and /posts/a-gentle-tls-intro-for-perlers/ (link) are more typical posts of mine, but are apparently pretty popular and I would say for good reason.

/n/a, /apple-touch-icon.png, /apple-touch-icon-precomposed.png all seem like some weird user agent thing, like maybe iOS checks for that if someone makes a bookmark?

World Wide Readership

Ignoring the seriously hotlinked image above, I can easily see where most of my blog is accessed:

perl read.pl '*.2016-03-*.gz' cs_uri_stem x_edge_location  | \
  grep -v 'transform' | cut -f 2 | perl -p -e 's/[0-9]+//' | \
  sort | uniq -c | sort -n

Here’s the top 15 locations which serve my blog:

  21330 JFK # New York
   9668 IAD # Washington D.C.
   8845 ORD # Chicago
   7098 LHR # London
   6536 FRA # Frankfurt
   5319 DFW # Dallas
   4568 ATL # Atlanta
   4328 SEA # Seattle
   3345 SFO # San Fransisco
   3137 CDG # Paris
   2991 AMS # Amsterdam
   2966 EWR # Newark
   2339 LAX # Los Angeles
   1993 ARN # Stockholm
   1789 WAW # Warsaw

I’m super pleased at this, because before the migration to CloudFront all of this would be served from a single server in DFW. It was almost surely enough but it’d be slower, especially for the stuff outside of the states.


Aside from the fact that I have not yet set up the redirect for the old feed URLs, I think the migration to CloudFront has gone very well. I’m pleased that I’m less worried about rebooting my Linode and that my blog is served quickly, cheaply, and efficiently to readers worldwide.

Posted Sat, Mar 26, 2016

DBI Logging and Profiling

If you use Perl and connect to traditional relational databases, you use DBI. Most of the Perl shops I know of nowadays use DBIx::Class to interact with a database. This blog post is how I “downported” some of my DBIx::Class ideas to DBI. Before I say much more I have to thank my boss Bill Hamlin, for showing me how to do this.

Ok so when debugging queries, with DBIx::Class you can set the DBIC_TRACE environment variable and see the queries that the storage layer is running. Sadly sometimes the queries end up mangled, but that is the price you pay for pretty printing.

You can actually get almost the same thing with DBI directly by setting DBI_TRACE to SQL. That is technically not supported everywhere, but it has worked everywhere I’ve tried it. If I recall correctly though, unlike with DBIC_TRACE, using DBI_TRACE=SQL will not include any bind arguments.

Those two features are great for ad hoc debugging, but at some point in the lifetime of an application you want to count the queries executed during some workflow. The obvious example is during the lifetime of a request. One could use DBIx::Class::QueryLog or something like it, but that will miss queries that were executed directly through DBI, and it’s also a relatively expensive way to just count queries.

The way to count queries efficiently involves using DBI::Profile, which is very old school, like a lot of DBI. Here’s how I got it to work just recording counts:

#!/usr/bin/env perl

use 5.12.0;
use warnings;

use Devel::Dwarn;
use DBI;
use DBI::Profile;
$DBI::Profile::ON_DESTROY_DUMP = undef;

my $dbi_profile = DBI::Profile->new(
  Path => [sub { $_[1] eq 'execute' ? ('query') : (\undef) }]
);

$DBI::shared_profile = $dbi_profile;

my $dbh = DBI->connect('dbi:SQLite::memory:');
my $sth = $dbh->prepare('SELECT 1');
$sth->execute;
$sth->execute;
$sth->execute;

$sth = $dbh->prepare('SELECT 2');
$sth->execute;
$sth->execute;
$sth->execute;

my @data = $dbi_profile->as_node_path_list;
Dwarn \@data;

And in the above case the output is:

[
  [
    [
      6,
      "6.67572021484375e-06",
      "2.86102294921875e-06",
      0,
      "2.86102294921875e-06",
      "1458836436.12444",
      "1458836436.12448"
    ],
    "query"
  ]
]

The outermost arrayref is supposed to contain all of the profiled queries, so each arrayref inside of that is a query, with it’s profile data as the first value (another arrayref) inside, and all of the values after that first arrayref are user configurable.

So the above means that we ran six queries. There are some numbers about durations but they are so small that I won’t consider them carefully here. See the link above for more information. Normally if you had used DBI::Profile you would see two distinct queries, with a set of profiling data for each, but here we see them all merged into a single bucket. All of the magic for that is in my Path code references.

Let’s dissect it carefully:

$_[1] eq 'execute' # 1
  ? ('query')      # 2
  : (\undef)       # 3

Line 1 checks the DBI method being used. This is how we avoid hugely inflated numbers. We are trading off some granularity here for a more comprehensible number. See, if you prepare 1000 queries, you are still doing 1000 roundtrips to the database, typically. But that’s a weird thing, and telling a developer how many “queries they did” is easier to understand when that means simply executing the query.

In line 2 we return ('query'). This is what causes all queries to be treated as if they were the same. We could have returned any constant string here. If we wanted to do something weird, like count based on type of query, we could do something clever like the following:

return (\undef) if $_[1] eq 'execute';
local $_ = $_;

s/^\s*(\w+)\s+.*$/$1/;
return ($_);

That would create a bucket for SELECT, UPDATE, etc.

Ok back to dissection; line 3 returns (\undef), which is weird, but it’s how you signal that you do not want to include a given sample.


So the above is how you generate all of the profiling information. You can be more clever and include caller data or even bind parameters, though I’ll leave those as a post for another time. Additionally, you could carefully record your data and then do some kind of formatting at read time. Unlike DBIC_TRACE where you can end up with invalid SQL, you could use this with post-processing to show a formatted query if and only if it round trips.

Now go forth; record some performance information and ensure your app is fast!

UPDATE: I modified the ON_DESTROY_DUMP to set it to undef instead of an empty code reference. This correctly avoids a lot of work at object destriction time. Read this for more information.

Posted Thu, Mar 24, 2016

How to Enable ptrace in Docker 1.10

This is just a quick blog post about something I got working this morning. Docker currently adds some security to running containers by wrapping the containers in both AppArmor (or presumably SELinux on RedHat systems) and seccomp eBPF based syscall filters. This is awesome and turning either or both off is not recommended. Security is a good thing and learning to live with it will make you have a better time.

Normally ptrace, is disabled by the default seccomp profile. ptrace is used by the incredibly handy strace. If I can’t strace, I get the feeling that the walls are closing in, so I needed it back.

One option is to disable seccomp filtering entirely, but that’s less secure than just enabling ptrace. Here’s how I enabled ptrace but left the rest as is:

A handy perl script

#!/usr/bin/perl

use strict;
use warnings;

# for more info check out https://docs.docker.com/engine/security/seccomp/

# This script simply helps to mutate the default docker seccomp profile.  Run it
# like this:
#
#     curl https://raw.githubusercontent.com/docker/docker/master/profiles/seccomp/default.json | \
#           build-seccomp > myapp.json

use JSON;

my $in = decode_json(do { local $/; <STDIN> });
push @{$in->{syscalls}}, +{
  name => 'ptrace',
  action => 'SCMP_ACT_ALLOW',
  args => []
} unless grep $_->{name} eq 'ptrace', @{$in->{syscalls}};

print encode_json($in);

In action

So without the custom profile you can see ptrace not working here:

$ docker run alpine sh -c 'apk add -U strace && strace ls'
fetch http://dl-4.alpinelinux.org/alpine/v3.2/main/x86_64/APKINDEX.tar.gz
(1/1) Installing strace (4.9-r1)
Executing busybox-1.23.2-r0.trigger
OK: 6 MiB in 16 packages
strace: test_ptrace_setoptions_for_all: PTRACE_TRACEME doesn't work: Operation not permitted
strace: test_ptrace_setoptions_for_all: unexpected exit status 1

And then here is using the profile we generated above:

$ docker run --security-opt "seccomp:./myapp.json" alpine sh -c 'apk add -U strace && strace ls'
2016/03/18 17:08:53 Error resolving syscall name copy_file_range: could not resolve name to syscall - ignoring syscall.
2016/03/18 17:08:53 Error resolving syscall name mlock2: could not resolve name to syscall - ignoring syscall.
fetch http://dl-4.alpinelinux.org/alpine/v3.2/main/x86_64/APKINDEX.tar.gz
(1/1) Installing strace (4.9-r1)
Executing busybox-1.23.2-r0.trigger
OK: 6 MiB in 16 packages
execve(0x7ffe02456c88, [0x7ffe02457f30], [/* 0 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x7f0df919c048) = 0
set_tid_address(0x7f0df919c080)         = 16
mprotect(0x7f0df919a000, 4096, PROT_READ) = 0
mprotect(0x5564bb1e7000, 16384, PROT_READ) = 0
getuid()                                = 0
ioctl(0, TIOCGWINSZ, 0x7ffea2895340)    = -1 ENOTTY (Not a tty)
ioctl(1, TIOCGWINSZ, 0x7ffea2895370)    = -1 ENOTTY (Not a tty)
ioctl(1, TIOCGWINSZ, 0x7ffea2895370)    = -1 ENOTTY (Not a tty)
stat(0x5564bafdde27, {...})             = 0
open(0x5564bafdde27, O_RDONLY|O_DIRECTORY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents64(3, 0x5564bb1ec040, 2048)     = 512
lstat(0x5564bb1ec860, {...})            = 0
lstat(0x5564bb1ec900, {...})            = 0
lstat(0x5564bb1ec9a0, {...})            = 0
lstat(0x5564bb1eca40, {...})            = 0
lstat(0x5564bb1ecae0, {...})            = 0
lstat(0x5564bb1ecb80, {...})            = 0
lstat(0x5564bb1ecc20, {...})            = 0
lstat(0x5564bb1eccc0, {...})            = 0
lstat(0x5564bb1ecd60, {...})            = 0
lstat(0x5564bb1ece00, {...})            = 0
lstat(0x5564bb1ecea0, {...})            = 0
lstat(0x5564bb1ecf40, {...})            = 0
lstat(0x5564bb1ecfe0, {...})            = 0
lstat(0x7f0df919e6e0, {...})            = 0
lstat(0x7f0df919e780, {...})            = 0
bin
dev
etc
home
lib
linuxrc
media
mnt
proc
root
run
sbin
sys
tmp
usr
var
lstat(0x7f0df919e820, {...})            = 0
getdents64(3, 0x5564bb1ec040, 2048)     = 0
close(3)                                = 0
ioctl(1, TIOCGWINSZ, 0x7ffea2895278)    = -1 ENOTTY (Not a tty)
writev(1, [?] 0x7ffea2895210, 2)        = 4
writev(1, [?] 0x7ffea2895330, 2)        = 70
exit_group(0)                           = ?
+++ exited with 0 +++

A final warning

The above is not too frustrating and is more secure than disabling seccomp entirely, but enabling ptrace as a general course of action is likely to be wrong. I am doing this because it helps with debugging stuff inside of my container, but realize that for long running processes you can always strace processes that are running in the container from the host.

Posted Fri, Mar 18, 2016