Building Secure UserAgents

I have been working on making an HTTP client (also known as a user agent) that is safe for end-users to control. I investigated building it in Perl, Python, asynchronous Perl, and Go.

During my brief downtime during my paternity leave I’ve been toying with a new application. One of the things this application will do is make web requests on behalf of users. There are plenty of examples of applications that do this already: RSS Readers, anything that has OpenID login support, and things that do postbacks; when someone sends an SMS to my Twilio number, it hits an endpoint of my choosing.

Sometimes applications that do these kinds of requests can be vulnerable to attack. Last year Clint Ruoho found a handful of problems with Pocket, a service Mozilla had recently bundled with Firefox.

The vulnerabilities listed there are only the beginning. Here are some things that an attacker could do:

  • Connect to private services, listening only on localhost, assumed to be secure
  • Read from AWS EC2 UserData (which Ruoho did in the example above)
  • Connect to private services running on other servers, that are not normally addressible to the outside world

How do we protect against this?

I suspect that most people protect against this by analyzing the url in the request.

if ($req->url->host eq '127.0.0.1') { ... }

For example, today, if you go to http://isup.me/127.0.0.1 (or the localhost version) it knows that you are hitting a “non-internet” URL. I made a domain (test.afoolishmanifesto.com) that resolves to 127.0.0.1 and today, if you go to http://isup.me/test.afoolishmanifesto.com it claims that the site is actually up. And that’s just the tip of the iceberg. We can tell that isup.me is running in an AWS-like environment because http://isup.me/169.254.169.254 seems to be “up” from the server’s perspective. There are a non-trivial number of private IP addresses like this (details in the appendix.)

So at the very least we cannot merely inspect the request, we need to verify the resolution of the domain.

use Socket 'getaddrinfo', 'NI_NUMERICHOST';
my (undef, @addrs) = getaddrinfo($req->uri->host, NI_NUMERICHOST);
my @ips = map {
   my (undef, $ip, $service) = getnameinfo($_->{addr}, NI_NUMERICHOST);
   $ip
} @addrs;
if (grep { $_ eq '127.0.0.1' } @ips) { ... }

Even that is insufficient though. As Ruoho found, many user agents will automatically handle redirects, so even though the implementor may have done all of the above (which I think is non-trivial; I left out a lot of error handling in the second part and none of it correctly handles all of the various IP masks,) a domain could be validated, and then redirect to an IP that should have been blocked.

There’s also what is sometimes called “tarpits.” Some user agents define timeouts as “stall” timeouts: they reset when any progress is made. Consider the Slowloris attack, but implemented at the server side instead of at the client. Similarly a DNS server can return long chains of CNAMEs to cause the same kind of problem. This should be fixed with a global timeout (instead of the more common stall timeouts referenced before.)

Another vulnerability is unexpected schemata for requests. Some clients are smart enough to access file://, ftp://, etc. Clients like this must be defanged such that they only access http:// and https://. I tend to only use less magical clients, but support for the above is only a patch away.

Solutions

The redirect detail makes it clear that the post resolution verification must happen within the user agent. A solid user agent design should make this reasonably doable. The first user agent I’d heard of that tackled these problems (though likely not the first in existence) is LWPx::ParanoidAgent, made by Brad Fitzpatrick almost surely while at LiveJournal to protect against attacks originating from OpenID servers. LWP::UserAgent::Paranoid has since supplanted it with better, more modular code; but the general idea and usage is the same.

IO::Async

The problem with these two modules is that they are written in the classic blocking style. If you need to make 20 HTTP requests and each takes 0.5s you just spent 10s. Newer tools are asynchronous, and so could do 20 HTTP requests in parallel. When I do async in Perl I use IO::Async. In IO::Async here is how you could create a safe client:

#!/usr/bin/env perl

use 5.24.0;
use warnings;

use Net::Async::HTTP;
use IO::Async::Loop::Epoll;

use Net::Subnet;

# this list is incomplete, see the appendix
my $private = subnet_matcher qw(
   10.0.0.0/8
   172.16.0.0/12
   192.168.0.0/16
   127.0.0.0/8
   169.254.0.0/16
);

my $loop = IO::Async::Loop::Epoll->new;
my $http = Net::Async::HTTP->new(
   timeout => 10,
);

$loop->add( $http );

my ( $response ) = $http->do_request(
   uri => URI->new( shift ),
   on_ready => sub {
      my $sock = $_[0]->read_handle->peerhost;
      if ($private->($sock)) {
        close $sock;
        return Future->fail('Illegal IP') 
      }
      Future->done;
   },
)->get;

print $response->code;

If I end up using Perl for this project I’ll likely publish a subclass of naHTTP, or submit a patch, allowing the on_ready handler to be set for the whole class instead of requiring it to be set per request.

Go

Before I came up with the async Perl option above I had come to the conclusion that it was be a ton of work to get it working in IO::Async and that I should just use Go. I might still use Go, as it’s more well supported for code of this nature. In Go I was able to basically use the same technique as above:

package main

import (
  "errors"
  "fmt"
  "net"
  "net/http"
  "os"
  "time"
)

func main() {
  _, net1, _ := net.ParseCIDR("10.0.0.0/8")
  _, net2, _ := net.ParseCIDR("172.16.0.0/12")
  _, net3, _ := net.ParseCIDR("192.168.0.0/16")
  _, net4, _ := net.ParseCIDR("127.0.0.0/8")
  _, net5, _ := net.ParseCIDR("169.254.0.0/16")
  nets := [](*net.IPNet){net1, net2, net3, net4, net5}

  internalClient := &http.Client{
    Timeout: 10 * time.Second,
    Transport: &http.Transport{
      Dial: func(network, addr string) (net.Conn, error) {
        conn, err := net.Dial(network, addr)

        if err != nil {
          return nil, err
        }

        ipStr, _, err := net.SplitHostPort(conn.RemoteAddr().String())
        // no idea how this could happen
        if err != nil {
          return nil, err
        }

        ip := net.ParseIP(ipStr)
        for _, net := range nets {
          if net.Contains(ip) {
            err := conn.Close()
            if err != nil {
              // wtf
            }
            return nil, errors.New("Illegal IP")
          }
        }

        return conn, nil
      },
    },
  }

  res, err := internalClient.Get(os.Args[1])

  if err != nil {
    fmt.Println(err)
    os.Exit(1)
  }

  fmt.Println(res.Status)
}

The above is very similar to the IO::Async version. Basically we set a global timeout on the client, and the in the code that connects to a socket, vet the socket before continuing.

Python

Perl is not really the “big dog” of dynamic languages anymore, so I figured I’d document how to do this with a more popular language. I mentioned that I’ve been toying with Python lately already, so it seemed like the most natural choice. If you know how to do this with other languages hit me up.

I looked at urllib2, urllib3, and requests, and it seemed like this kind of feature is impossible in these popular Python libraries without significant rewriting, duplication, or patches. I would love to be wrong here, and will update this post if someone can show me how to do what needs to be done. Otherwise, if you are using Python and need to do requests on behalf of the user, best of luck: you may end up writing your own HTTP client.

Also beware that at least urllib2 is helpful enough to provide support for file://. Make sure that if you are using urllib2, even indirectly, you remove support for untrusted handlers.


As with all security concerns, this is about measuring the cost of failure. There is no bug free code; the cost of eternal vigilance and perfection are too high. The only other option I know of would be to spin up a completely separate virtual machine isolated as much as possible from the rest of your system, in it’s own DMZ maybe. This is feasible, but it is certainly a high cost alternative to something that’s not technically difficult.

I was surprised at how easy this was in both Go and IO::Async after striking upon the post-connection verification idea. Initially I had assumed that this was a nearly impossible to solve problem, because I assumed it needed to hook into DNS resolution directly.

The other big win in this modern day and age is that timeouts are easier to implement, and tend to be more trustworthy.

I hope this helps!


Appendix: Private Ranges

Please do not assume that this list is complete. I would love for it to be up-to-date and trustworthy, but it requires knowing all of the relevant RFC’s. Here are the ones I know about and where they are from, almost all of these were informed by RFC6890, Sections 2.2.2 and 2.2.3. Note also that some of these may not be a security vulnerability, like 0.0.0.0/8, but generally I doubt that the extra check is going to be expensive enough to matter.

Address Block Relevant RFC
0.0.0.0/8 RFC1122
10.0.0.0/8 RFC1918
100.64.0.0/10 RFC6598
127.0.0.0/8 RFC1122
169.254.0.0/16 RFC3927
172.16.0.0/12 RFC1918
192.0.0.0/24 RFC6890
192.0.0.0/29 RFC6333
192.0.2.0/24 RFC5737
192.88.99.0/24 RFC3068
192.168.0.0/16 RFC1918
198.18.0.0/15 RFC2544
198.51.100.0/24 RFC5737
203.0.113.0/24 RFC5737
240.0.0.0/4 RFC1112
255.255.255.255/32 RFC0919

The IPv6 ranges have a lot of weird stuff in them. One block, for example, was terminated already a couple years ago. Again, I suspect that for most of them it’s safe to block them and then remove the block later if you find that you need to (like if you absurdly end up on an IPv6 only network.)

Address Block Relevant RFC
::1/128 RFC4291
::/128 RFC4291
64:ff9b::/96 RFC6052
::ffff:0:0/96 RFC4291
100::/64 RFC6666
2001::/23 RFC2928
2001::/32 RFC4380
2001:2::/48 RFC5180
2001:db8::/32 RFC3849
2001:10::/28 RFC4843
2002::/16 RFC3056
fc00::/7 RFC4193
fe80::/10 RFC4291

There are likely more. I think the definitive listings are here and here respectively, but some of the blocks in those listings don’t look private to me.

Posted Mon, Jul 25, 2016

A visit to the Workshop: Hugo/Unix/Vim integration

I write a lot of little tools and take pride in thinking of myself as a toolsmith. This is the first post of hopefully many specifically highlighting the process of the creation of a new tool.

I wanted to do some tag normalization and tag pruning on my blog, to make the tags more useful (eg instead of having all of dbic, dbix-class, and dbixclass just pick one.) Here’s how I did it.

As mentioned previously this blog is generated by Hugo. Hugo is excellent at generating static content; indeed that is it’s raison d’être. But there are places where it does not do some of the things that a typical blogging engine would.

To normalize tags I wanted to look at tags with their counts, and then associated filenames for a given tag. If I were using WordPress I’d navigate around the web interface and click edit and this use case would be handled. Not for me though, because I want to avoid the use of my web browser if at all possible. It’s bloated, slow, and limited.

Anatomy of an Article

Before I go much further here is a super quick primer on what an article looks like in hugo:

---
aliases: ["/archives/984"]
title: "Previous Post Updated"
date: "2009-07-24T00:59:37-05:00"
tags: ["book", "catalyst", "perl", "update"]
guid: "http://blog.afoolishmanifesto.com/?p=984"
---
Sorry about that guys, I didn't use **links** to make it clear which book I was
talking about. Usually I do that kind of stuff but the internet was sucky
(fixed!) so it hurt to look up links. Enjoy?

The top part is YAML. Hugo supports lots of different metadata formats but all of my posts use YAML. The part after the --- is the content, which is simply markdown.

Unix Style Tools

My first run at this general problem was to build a few simple tools. Here’s the one that would extract the metadata:

#!/usr/bin/env perl

use 5.22.0;
use warnings;
use autodie;

for my $file (@ARGV) {
  open my $fh, '<', $file;
  my $cnt = 0;
  while (<$fh>) {
    $cnt ++ if $_ eq "---\n";
    print $_ if $cnt < 2
  }
}

The above returns the YAML part, which can then be consumed by a tool with a YAML parser.

Then I built a tool on top of that, called tag-count:

#!/usr/bin/env perl

use 5.22.0;
use warnings;

use sort 'stable';

use experimental 'postderef';

use YAML;

my $yaml = `bin/metadata content/posts/*`;
my @all_data = Load($yaml);

my @tags = map(($_->{tags}||[])->@*, @all_data);
my %tags;

$tags{$_}++ for @tags;

for (sort { $tags{$b} <=> $tags{$a} } sort keys %tags) {
   printf "%3d $_\n", $tags{$_}
}

That works, but it’s somewhat inflexible. When I thought about how I wanted to get the filenames for a given tag I decided I’d need to modify the metadata script, or make the calling script a lot more intelligent.

Advanced Unix Tools

So the metadata extractor turned out to be too simple. At some point I had the realization that what I really wanted was a database of data about my posts that I could query with SQL. Tools built on top of that would be straightforward to build and their function would be clear.

So I whipped up what I call q:

#!/usr/bin/env perl

use 5.22.0;
use warnings;
use autodie;
use experimental 'postderef';

use DBI;
use File::Find::Rule;
use Getopt::Long;
my $sql;
my $formatter;
GetOptions (
   'sql=s' => \$sql,
   'formatter=s' => \$formatter,
) or die("Error in command line arguments\n");

use YAML::XS 'Load';

# build schema
my $dbh = DBI->connect('dbi:SQLite::memory:', {
      RaiseError => 1,
});

$dbh->do(<<'SQL');
   CREATE TABLE articles (
      title,
      date,
      guid,
      filename
   )
SQL

$dbh->do(<<'SQL');
   CREATE TABLE article_tag ( guid, tag )
SQL

$dbh->do(<<'SQL');
   CREATE VIEW _ ( guid, title, date, filename, tag ) AS
   SELECT a.guid, title, date, filename, tag
   FROM articles a
   JOIN article_tag at ON a.guid = at.guid
SQL

# populate schema
for my $file (File::Find::Rule->file->name('*.md')->in('content')) {
  open my $fh, '<', $file;
  my $cnt = 0;
  my $yaml = "";
  while (<$fh>) {
    $cnt ++ if $_ eq "---\n";
    $yaml .= $_ if $cnt < 2
  }
  my $data = Load($yaml);
  $data->{tags} ||= [];

  $dbh->do(<<'SQL', undef, $data->{guid}, $data->{title}, $data->{date}, $file);
      INSERT INTO articles (guid, title, date, filename) VALUES (?, ?, ?, ?)
SQL

  $dbh->do(<<'SQL', undef, $data->{guid}, $_) for $data->{tags}->@*;
      INSERT INTO article_tag (guid, tag) VALUES (?, ?)
SQL
}

# run sql
my $sth = $dbh->prepare($sql || die "pass some SQL yo\n");
$sth->execute(@ARGV);

# show output
for my $row ($sth->fetchall_arrayref({})->@*) {
   my $code = $formatter || 'join "\t", map $r{$_}, sort keys %r';
   say((sub { my %r = $_[0]->%*; eval $code })->($row))
}

With less than 80 lines of code I have a super flexible tool for querying my corpus! Here are the two tools mentioned above, as q scripts:

bin/tag_count:

#!/bin/dash

exec bin/q \
   --sql 'SELECT COUNT(*) AS c, tag FROM _ GROUP BY tag ORDER BY COUNT(*), tag' \
   --formatter 'sprintf "%3d  %s", $r{c}, $r{tag}'

bin/tag-files:

#!/bin/dash

exec bin/q --sql "SELECT filename FROM _ WHERE tag = ?" -- "$1"

And then this one, which I was especially pleased with because it was a use case I came up with after building q.

bin/chronological:

#!/bin/dash

exec bin/q --sql 'SELECT filename, title, date FROM articles ORDER BY date DESC' \
      --format 'my ($d) = split /T/, $r{date}; "$r{filename}:1:$d $r{title}"'

I’m pleasantly surprised that this is fast. All of the above take under 150ms, even though the database is not persistent across runs.

Vim integration

Next I wanted to integrate q into Vim, so that when I wanted to see all posts tagged vim (or whatever) I could easily do so from within the current editor instance instead of spawning a new one.

:Tagged

To be clear, the simple way, where you spawn a new instance, is easily achieved like this:

$ vi $(bin/tag-files vim)

But I wanted to do that from within vim. I came up with some functions and commands to do what I wanted, but it was fairly painful. Vim is powerful, but it gets weird fast. Here’s how I made a :Tagged vim command:

function Tagged(tag)
  execute 'args `bin/tag-files ' . a:tag . '`'
endfunction
command -nargs=1 Tagged call Tagged('<args>')

:execute is a kind of eval. In vim there are a lot of different execution contexts and each one needs it’s own kind of eval; so this is the Ex-mode eval. :args {arglist} simply sets the argument list. And the magic above is that surrounding a string with backticks causes the command to be executed and the output interpolated, just like in shell or Perl.

I also added a window local version, using :arglocal:

function TLagged(tag)
  exe 'arglocal `bin/tag-files ' . a:tag . '`'
endfunction
command -nargs=1 TLagged call TLagged('<args>')

:Chrono

I also used the quickfix technique I blogged about before because it comes with a nice, easy to use window (see :cwindow) and I added a caption to each file. I did it for the chronological tool since that ends up being the largest possible list of posts. Making it easier to navigate is well worth it. Here’s the backing script:

#!/bin/dash

exec bin/q --sql 'SELECT filename, title, date FROM articles ORDER BY date DESC' \
           --format 'my ($d) = split /T/, $r{date}; "$r{filename}:1:$d $r{title}"'

and then the vim command is simply:

command Chrono cexpr system('bin/quick-chrono')

:TaggedWord

Another command I added is called :TaggedWord. It takes the word under the cursor and loads all of the files with that tag into the argument list. If I can figure out how to bake it into CTRL-] (or something else like it) I will, as that would be more natural.

function TaggedWord()
  " add `-` as a "word" character
  set iskeyword+=45
  " save the current value of the @m register
  let l:tmp = @m
  normal "myiw
  call Tagged(@m)
  " restore
  set iskeyword-=45
  let @m = l:tmp
endfunction
command TaggedWord call TaggedWord()

I also made a local version of that, but I’ll leave the definition of that one to the reader as an exercise.

Tag Completion

As a final cherry on top I added a completion function for tags. This is probably the most user-friendly way I can keep using the right tags. When I write a post, and start typing tags, existing tags will autocomplete and thus will be more likely to be selected than to be duplicated. It’s not perfect, but it’s pretty good. Here’s the code:

au FileType markdown execute 'setlocal omnifunc=CompleteTags'
function! CompleteTags(findstart, base)
  " This is almost purely cargo culted from the vim doc
  if a:findstart
    let line = getline('.')
    let start = col('.') - 1
    " tags are word characters and -
    while start > 0 && line[start - 1] =~ '\w\|-'
      let start -= 1
    endwhile
    return start
  else
    " only run the command if we are on the "tags: [...]" line
    if match(getline('.'), "tags:") == -1
      return []
    endif

    " get list of tags that have current base as a prefix
    return systemlist('bin/tags ' . a:base . '%')
  endif
endfun

And here’s the referenced bin/tags:

#!/bin/dash

match="${1:-%}"
bin/q --sql 'SELECT tag FROM article_tag WHERE tag LIKE ? GROUP BY tag' -- "$match"

This little excursion was a lot of fun for me. I’ve always thought that Vim’s completion was black magic, but it’s really not. And the lightbulb moment about building an in memory SQLite database was particularly rewarding. I hope I inspired readers to write some tools as well; go forth, write!

Posted Wed, Jul 20, 2016

Development with Docker

I have not seen a lot of great examples of how to use Docker as a developer. There are tons of examples of how to build images; how to use existing images; etc. Writing code that will end up running inside of a container and more so writing code that gets compiled, debugged, and developed in a container is a bit tricker. This post dives into my personal usage of containers for development. I don’t know if this is normal or even good, but I can definitely vouch that it works.

First off, I am developing with an interpreted language most of the time. I still think these issues apply with compiled languages but they are easier to ignore and sweep under the rug. In this post I’ll show I create layered images for developing a simple web service in Perl. It could be Ruby or Python of course, I just know Perl the best so I’m using it for the examples.

Here is a simple Makefile to build the images:

api-image:
	docker build -f ./Dockerfile.api -t pw/api .

db-image:
	exit 1

perl-base-image:
	docker build -f ./Dockerfile.perl-base -t pw/perl-base .

I can build three images, one of which (db) is not-yet-defined but planned.

base

Here is Dockerfile.perl-base

FROM alpine:3.4

ADD cpanfile /root/cpanfile
RUN \
   apk add --update build-base wget perl perl-dev && \
   cpan App::cpm && \
   cd /root && \
   cpm -n --installdeps .

I use Alpine as the underlying image for my containers if possible, because it is almost as lightweight as it gets. Beware though, if you use it you may run into problems because it uses musl instead of glibc. I have only run into issues twice though, and one was a bug in the host kernel.

Next I add the cpanfile to the image. I could probably do something weird like build the Dockerfile and directly add the lines from the cpanfile to the Dockerfile, but that doesn’t seem worth the effort to me.

Finally I, in a single layer (hence the && \’s:)

  • Install Perl (which is a very recent 5.22)
  • Install cpm
  • Install the dependencies of the application

Basically what the above gives you is a cache layer where most of your dependencies are installed. This can hugely speed development while you are adding dependencies to the next layer. This methodology is also useful at deployment time, because new builds of the codebase need not rebuild the entire base image, but instead just one or more layers on top. The base image in this example is over 400 megs, and that’s with Alpine; if it were Ubuntu it would likely be over 700. The point is you don’t want to have to push that whole base layer to production for a spelling fix.

api

Here is Dockerfile.api

FROM pw/perl-base

ADD . /opt/api
WORKDIR /opt/api

RUN cpm -n --installdeps .

Sometimes I’ll add extra bits to the RUN directive. Like currently in the project I’m working on it’s:

RUN apk add perl-posix-strftime-compiler && cpanm --installdeps .

Because I needed Alpine’s patched POSIX::Strftime::Compiler. That will at some point be baked into the lower layer.

Refinements

If your project is sufficiently large, it is also likely worth it to break api into two layers. One called, for example, staging, which is almost exactly the same as the base layer, but it’s FROM is your base. api then becomes just the ADD and WORKDIR directives.

Another pretty cool refinement is to use docker run to build images. If you have special build requirements this is super handy. A couple reasons why one might need this would include needing to run multiple programs at once during the build, or needing to mount code that will not be added directly to an image. Here’s how it’s done:

FROM=pw/stage2
TMP_DIR=$(mktemp -td tmp.$1.XXXXXXXX)

# start container
docker run -d \
   --name $TMPNAME \
   --volume $TMP_DIR:/tmp \
   $FROM /sbin/init

# build
docker exec $TMPNAME build --my /code

# save to pw/api
docker commit -m "build --my /code" $TMPNAME pw/api
docker rm -f $TMPNAME
sudo rm -rf $TMP_DIR

Both of these refinements are arguably gross, but they really help speed development and solve problems, so until there are better ways, I’m happy with them.

Running

The above is a useful workflow for building your images, but that does not answer how the containers are used during development. There are a couple pieces to the answer there. First is this little script, which I placed in maint/dev-api:

#!/bin/dash

exec docker run --rm \
                --link some-postgres:db \
                --publish 5000:5000 \
                --user "$(id -u)" \
                --volume "$(pwd):/opt/api" \
                pw/api "$@"

The --link and --publish directives are sorta ghetto. At some point I’ll make the script dispatch based on the arguments and only link or publish if needed.

If possible I always use a non-root user, hence the --user directive. It is probably silly, but you almost never need root anyway, so you might as well not give it to the container. This has the nice side effect of ensuring that any files created from the container in a volume have the right owner.

The --volume should be clear: it replaces the code you built into the image with the code that’s on your laptop, without requiring a rebuilt image.

The other part to make this all work are a few more directives in the Makefile:

prepare-migrations:
	maint/dev-api perl -Ilib bin/update-database

run-migrations:
	docker run --rm --link some-postgres:db pw/api perl -Ilib bin/update-database 1

run-db:
	docker run --name some-postgres -d postgres

rm-db:
	docker rm -f some-postgres

I haven’t gotten around to creating a database container; I’m just using the official docker one for now. I will eventually replicate it for my application in a more lightweight fashion, but this helps me get up and get going. I wouldn’t have made the rm-db directive except the docker tab completion seems to be pretty terrible, but the make tab completion is perfect.

run-migrations is a little weird. It requires a complete rebuild just to update some DDL; but I believe it will be worth it in the long term. I suspect that I’ll be able to push the api container to some host, run-migrations, and it be done, instead of needing a checkout of the code on the host.

Linking

One of the details above that I haven’t gone into is the --link directive. This sets up the container so that it has access to the other container, with some environment variables set for the exposed ports in the linked container. On the face of it, this is just a way to connect two containers. Here is how I’m connecting from a script that deploys database code:

#!/usr/bin/env perl

use 5.22.0;
use warnings;

use DBIx::RetryConnect 'Pg';
use PW::Schema;
my $s = PW::Schema->connect(
 "dbi:Pg:dbname=postgres;host=$ENV{DB_PORT_5432_TCP_ADDR}",
 'postgres',
 $ENV{DB_ENV_POSTGRES_PASSWORD},
);

Notice that I simply use some environment variables that follow a fairly obvious pattern (though it can be referenced by linking a container running env more easily than the docs.)

One other subtle detail is the use of DBIx::RetryConnect. With containers it is much more common to start all of your containers concurrently, versus with typical init systems or even virtual machines. This means baking retries into your applications, as it stands today, is a requirement. Either that or you add stupid sleep statements and hope nothing ever gets run on an overloaded machine.

Refinements

Linking is pretty cool. For those who haven’t investigated this space much, linking seems like some cool magic “thing.” Linking is actually a builtin service discovery method for allowing containers to know about each other. But linking has a major drawback: to link containers in docker you have to start the containers serially. This is because links are resolved at container creation time. Worse yet you can’t change the environment variables of a running program, so links cannot be updated. This is at the very least a hassle because it introduces a synthetic, implied ordering to the starting of containers.

You can resolve the ordering problem with docker network:

# run API container
docker run -d \
   --name $NAME \
   pw/api www

# add to network
docker network create pw
docker network connect pw $NAME

# run db container
docker run --name db -d postgres
docker network connect pw db

Order no longer matters and you have much more flexibility with how you do discovery. But now you need to make a decision about discovery, as the environment variables will no longer be magically set for you. I strongly believe that this is where anyone doing anything moderately serious will end up anyway. The serialization of startup is just too finicky to be seriously considered.

I haven’t done enough with service discovery myself to recommend any path forward, but knowing the name to search for should give you plenty of rope.


I hope the ideas and examples above help anyone who is grappling with how to use Docker. Any criticisms or other ideas are welcome.

Posted Mon, Jul 18, 2016

Set-based DBIx::Class

This was originally posted to the 2012 Perl Advent Calendar. I refer people to this article so often that I decided to repost it here in case anything happens to the server it was originally hosted on.


I’ve been using DBIx::Class for a few years, and I’ve been part of the development team for just a little bit less. Three years ago I wrote a Catalyst Advent article about the five DBIx::Class::Helpers, which have since ballooned to twenty-four. I’ll be mentioning a few helpers in this post, but the main thing I want to describe is a way of using DBIx::Class that results in efficient applications as well as reduced code duplication.

(Don’t know anything about DBIx::Class? Want a refresher before diving in more deeply? Maybe watch my presentation on it, or, if you don’t like my face, try this one.)

The thesis of this article is that when you write code to act on things at the set level, you can often leverage the database’s own optimizations and thus produce faster code at a lower level.

Set Based DBIx::Class

The most important feature of DBIx::Class is not the fact that it saves you time by allowing you to sidestep database incompatibilities. It’s not that you never have to learn the exact way to paginate correctly with SQL Server. It isn’t even that you won’t have to write DDL for some of the most popular databases. Of course DBIx::Class does do these things. Any ORM worth it’s weight in salt should.

Chaining

The most important feature of DBIx::Class is the ResultSet. I’m not an expert on ORMs, but I’ve yet to hear of another ORM which has an immutable (if it weren’t for the fact that there is an implicit iterator akin to each %foo it would be 100% immutable. It’s pretty close though!) query representation framework. The first thing you must understand to achieve DBIx::Class mastery is ResultSet chaining. This is basic but critical.

The basic pattern of chaining is that you can do the following and not hit the database:

$resultset->search({
   name => 'frew',
})->search({
   job => 'software engineer',
})

What the above implies is that you can add methods to your resultsets like the following:

sub search_by_name {
   my ($self, $name) = @_;

   $self->search({ $self->current_source_alias . ".name" => $name })
}

sub is_software_engineer {
   my $self = shift;

   $self->search({
      $self->current_source_alias . ".job" => 'software engineer',
   })
}

And then the query would become merely

$resultset->search_by_name('frew')->is_software_engineer

(microtip: use DBIx::Class::Helper::ResultSet::Me to make defining searches as above less painful.)

Relationship Traversal

The next thing you need to know is relationship traversal. This can happen two different ways, and to get the most code reuse out of DBIx::Class you’ll need to be able to reach for both when the time arrises.

The first is the more obvious one:

$person_rs->search({
   'job.name' => 'goblin king',
}, {
   join => 'job',
})

The above finds person rows that have the job “goblin king.

The alternative to use related_resultset in DBIx::Class::ResultSet:

$job_rs->search_by_name('goblin_king')
       ->related_resultset('person')

The above generates the same query, but allows you to use methods that are defined on the job resultset.

Subqueries

Subqueries are less important for code reuse and more important in avoiding incredibly inefficient database patterns. Basically, they allow the database to do more on its own. Without them, you’ll end up asking the database for data, then you’ll send that data right back to the database as part of your next query. It’s not only pointless network overhead but also two queries.

Here’s an example of what not to do in DBIx::Class:

my @failed_tests = $tests->search({
   pass => 0,
})->all;

my @not_failed_tests = $tests->search({
  id => { -not_in => [map $_->id, @failed_tests] }, # XXX: DON'T DO THIS
});

If you got enough failed tests back, this would probably just error. Just Say No to inefficient database queries:

my $failed_tests = $tests->search({
   pass => 0,
})->get_column('id')->as_query;

my @not_failed_tests = $tests->search({
  id => { -not_in => $failed_tests },
});

This is much more efficient than before, as it’s just a single query and lets the database do what it does best and gives you what you exactly want.

Christmas!

Ok so now you know how to reuse searches as much as is currently possible. You understand the basics of subqueries in DBIx::Class and how they can save you time. My guess is that you actually already knew that. “This wasn’t any kind of ninja secret, fREW! You lied to me!” I’m sorry, but now we’re getting to the real meat.

Correlated Subqueries

One of the common, albeit expensive, usage patterns I’ve seen in DBIx::Class is using N + 1 queries to get related counts. The idea is that you do something like the following:

my @data = map +{
   %{ $_->as_hash },
   friend_count => $_->friends->count, # XXX: BAD CODE, DON'T COPY PASTE
}, $person_rs->all

Note that the $_->friends->count is a query to get the count of friends. The alternative is to use correlated subqueries. Correlated subqueries are hard to understand and even harder to explain. The gist is that, just like before, we are just using a subquery to avoid passing data to the database for no good reason. This time we are just going to do it for each row in the database. Here is how one would do the above query, except as promised, with only a single hit to the database:

my @data = map +{
   %{ $_->as_hash },
   friend_count => $_->get_column('friend_count'),
}, $person_rs->search(undef, {
   '+columns' => {
      friend_count => $friend_rs->search({
         'friend.person_id' =>
            { -ident => $person_rs->current_source_alias . ".id" },
      }, {
        alias => 'friend',
      })->count_rs->as_query,
   },
})->all

There are only two new things above. The first is -ident. All -ident does is tell DBIx::Class “this is the name of a thing in the database, quote it appropriately.” In the past people would have written -ident using queries like this:

'friend.person_id' => \' = foo.id' # don't do this, it's silly

So if you see something like that in your code base, change it to -ident as above.

The next new thing is the alias => 'friend' directive. This merely ensures that the inner rs has it’s own alias, so that you have something to correlate against. If that doesn’t make sense, just trust me and cargo cult for now.

This adds a virtual column, which is itself a subquery. The column is, basically, $friend_rs->search({ 'friend.person_id' => $_->id })->count, except it’s all done in the database. The above is horrible to recreate every time, so I made a helper: DBIx::Class::Helper::ResultSet::CorrelateRelationship. With the helper the above becomes:

my @data = map +{
   %{ $_->as_hash },
   friend_count => $_->get_column('friend_count'),
}, $person_rs->search(undef, {
   '+columns' => {
      friend_count => $person_rs->correlate('friend')->count_rs->as_query
   },
})->all

::ProxyResultSetMethod

Correlated Subqueries are nice, especially given that there is a helper to make creating them easier, but it’s still not as nice as we would like it. I made another helper which is the icing on the cake. It encourages more forward-thinking DBIx::Class usage with respect to resultset methods.

Let’s assume you need friend count very often. You should make the following resultset method in that case:

sub with_friend_count {
   my $self = shift;

   $self->search(undef, {
      '+columns' => {
         friend_count => $self->correlate('friend')->count_rs->as_query
      }
   }
}

Now you can just do the following to get a resultset with a friend count included:

$person_rs->with_friend_count

But to access said friend count from a result you’ll still have to use ->get_column('friend'), which is a drag since using get_column on a DBIx::Class result is nearly using a private method. That’s where my helper comes in. With DBIx::Class::Helper::Row::ProxyResultSetMethod, you can use the ->with_friend_count method from your row methods, and better yet, if you used it when you originally pulled data with the resultset, the result will use the data that it already has! The gist is that you add this to your result class:

__PACKAGE__->load_components(qw( Helper::Row::ProxyResultSetMethod ));
__PACKAGE__->proxy_resultset_method('friend_count');

and that adds a friend_count method on your row objects that will correctly proxy to the resultset or use what it pulled or cache if called more than once!

::ProxyResultSetUpdate

I have one more, small gift for you. Sometimes you want to do something when either your row or resultset is updated. I posit that the best way to do this is to write the method in your resultset and then proxy to the resultset from the row. If you force your API to update through the result you are doing N updates (one per row), which is inefficient. My helper simply needs to be loaded:

__PACKAGE__->load_components(qw( Helper::Row::ProxyResultSetUpdate ));

and your results will use the update defined in your resultset.

Don’t Stop!

This isn’t all! DBIx::Class can be very efficient and also reduce code duplication. Whenever you have something that’s slow or bound to result objects, think about what you could do to leverage your amazing storage layer’s speed (the RDBMS) and whether you can push the code down a layer to be reused more.

Posted Sat, Jul 16, 2016

Investigation: Why Can't Perl Read From TMPDIR?

On Wednesday afternoon my esteemed colleague Mark Jason Dominus (who already blogged this very story, but from his perspective), showed me that he had run into a weird issue. Here was how it manifested:

$ export TMPDIR='/mnt/tmp'
$ env | grep TMPDIR
TMPDIR=/mnt/tmp
$ /usr/bin/perl -le 'print $ENV{TMPDIR}'

So to be clear, nothing was printed by Perl.

Another strange detail was that it happened in our development sandboxes, but not in production. I quickly reproduced it in my sandbox and verified with strace that the env var was being set: (reformatted for readability)

$ strace -v -etrace=execve perl -le'print $ENV{TMPDIR}'
execve("/usr/bin/perl", ["perl", "-leprint $ENV{TMPDIR}"], [
  "HOME=/home/frew",
  "LANG=en_US.UTF-8",
  "LC_ALL=en_US.UTF-8",
  "LESSCLOSE=/usr/bin/lesspipe%s %"...,
  "LESSOPEN=| /usr/bin/lesspipe %s",
  "LOGNAME=frew",
  "LS_COLORS=rs=0:di=01;34:ln=01;36"...,
  "MAIL=/var/mail/frew",
  "NODE_PATH=/usr/lib/nodejs:/usr/l"...,
  "PATH=/usr/local/sbin:/usr/local/"...,
  "PWD=/home/frew",
  "SHELL=/bin/bash",
  "SHLVL=1",
  "SSH_AUTH_SOCK=/tmp/ssh-bbEAG2701"...,
  "SSH_CLIENT=10.30.1.183 22976 22",
  "SSH_CONNECTION=10.30.1.183 22976"...,
  "SSH_TTY=/dev/pts/2",
  "STARTERVIEW=/var/starterview",
  "TERM=screen-256color",
  "TMPDIR=/mnt/tmp",
  "USER=frew",
  "_=/usr/bin/strace"
]) = 0

It should be obvious that TMPDIR is included in the execve call above. I knew that there had been a recent security patch related to environment variables, so I ran apt-get upgrade in my sandbox and it fixed the issue! But in mjd’s sandbox he had the same exact version of Perl (verified by running sha1sum on /usr/bin/perl.) My sandbox is a local docker machine and his is an EC2 instance, so maybe something there could be causing an issue.

My next idea was to ask around in #p5p; the channel where people who hack on the core Perl code hang out on irc.perl.org. I’m crediting the people who had the first idea for a given thing to check. There was a lot of repetition, so I’ll spare you and only list the initial time something is mentioned.

Lukas Mai aka Mauke chimed in quickly saying that I should:

  • print the entire environment (perl -E'say "$_=$ENV{$_} for keys %ENV"')
  • use the perl debugger (PERLDB_OPTS='NonStop AutoTrace' perl -d -e0)
  • use ltrace

The first two of those were non-starters. Nothing interesting happened. Here is the unabbreviated ltrace of the issue in question:

$ ltrace perl -le'print $ENV{TMPDIR}'
__libc_start_main(0x400c70, 2, 0x7fff1fa24e88, 0x400f30, 0x400fc0 <unfinished ...>
Perl_sys_init3(0x7fff1fa24d7c, 0x7fff1fa24d70, 0x7fff1fa24d68, 0x400f30, 0x400fc0) = 0
__register_atfork(0x7fad644a3c10, 0x7fad644a3c50, 0x7fad644a3c50, 0, 0x7fff1fa24ca0) = 0
perl_alloc(0, 0x7fad6440efb8, 0x7fad6440ef88, 48, 0x7fff1fa24ca0) = 0x2551010
perl_construct(0x2551010, 0, 0, 0, 0)               = 0x2558f60
perl_parse(0x2551010, 0x400eb0, 2, 0x7fff1fa24e88, 0 <unfinished ...>
Perl_newXS(0x2551010, 0x40101c, 0x7fad64550f80, 0x7fff1fa24b90, 0x7fad645532c0) = 0x2571b28
<... perl_parse resumed> )                          = 0
perl_run(0x2551010, 0x2551010, 0, 0x2551010, 0
)     = 0
Perl_rsignal_state(0x2551010, 0, 0x2551288, 0x2551010, 0x7fff1fa24c50) = -1
Perl_rsignal_state(0x2551010, 1, -1, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 2, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 3, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 4, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 5, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 6, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 7, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 8, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 1
Perl_rsignal_state(0x2551010, 9, 1, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 10, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 11, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 12, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 13, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 14, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 15, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 16, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 17, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 18, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 19, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 20, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 21, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 22, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 23, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 24, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 25, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 26, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 27, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 28, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 29, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 30, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 31, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 32, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = -1
Perl_rsignal_state(0x2551010, 33, -1, 0x7fad6408a1b5, 0x7fff1fa24cb0) = -1
Perl_rsignal_state(0x2551010, 34, -1, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 35, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 36, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 37, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 38, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 39, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 40, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 41, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 42, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 43, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 44, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 45, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 46, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 47, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 48, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 49, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 50, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 51, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 52, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 53, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 54, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 55, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 56, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 57, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 58, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 59, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 60, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 61, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 62, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 63, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 64, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 6, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 17, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 29, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
Perl_rsignal_state(0x2551010, 31, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
perl_destruct(0x2551010, 0, 0, 0x7fad6408a1b5, 0x7fff1fa24cb0) = 0
perl_free(0x2551010, 0xffffffff, 0x2551010, 0x7fad6440b728, 0x7fad6478e0c0) = 2977
Perl_sys_term(0x7fad6440b720, 0, 0x7fad6440b778, 0x7fad6440b728, 0x7fad6478e0c0) = 0
exit(0 <unfinished ...>
+++ exited (status 0) +++

I still have yet to have ltrace actual help me with debugging. More on that later.

Next Ricardo Jelly Bean Signes mentioned that I should try diffing the environment. As expected the only differences were TMPDIR being missing, and _ being /usr/bin/perl or /usr/bin/env respectively.

Dominic Hargreaves looked closely at the patch (which he had ported to the version of Perl in question) and verified that it shouldn’t be causing what we were seeing.

At this point I decided to attempt to bisect a build of Perl to figure out the cause of the problem. Here’s what I did:

git clone git://anonscm.debian.org/perl/perl.git -b wheezy
make -f debian/rules build

I ctrl-c’d the tests, since I knew Perl was built at that point. When I did TMPDIR=foo ./perl -E'say $ENV{TMPDIR}' it “worked” and printed foo. I tried this both on a proper virtual machine, on my docker based sandbox, and on the metal of my laptop. None reproduced the problem. Bummer. I went home frustrated, without any answers.

The following morning I mentioned my progress in #p5p to see if anyone had any other ideas.

Todd Rinaldo verified that I wasn’t running perl under taint mode. I wasn’t, but that’s a great question. If you don’t know about taint mode, read the above. It could reasonably cause something like this. He also had me verify that env vars like TMPDIRA, TMPHAH, etc didn’t have the same issue (they did not.)

Matthew Horsfall had me compile and run the following code, to ensure that it worked like env. It did.

#include <unistd.h>
#include <stdio.h>

extern char **environ;

void main(void) {
  int i;

  for (i = 0; environ[i]; i++) {
    printf("%s\n", environ[i]);
  }
}

Matthew also verified what shell this happened under. I confirmed that it happened under both the GNU Bourne-Again Shell and the Debian Almquist Shell.

Next Andrew Main, more commonly known as Zefram, asked if I had a sitecustomize.pl. I did not.

Zefram next said I should try using gdb to inspect the running process. I needed some hand holding, but basically I did the following:

# install gdb
$ apt-get install gdb

# install debug headers
$ apt-get install libc6-dbg

$ gdb --args /usr/bin/perl -E 'say $ENV{TMPDIR}'
(gdb) break main
Breakpoint 1 at 0x41ca90
(gdb) run
Starting program: /usr/bin/perl perl -Esay\ \$ENV\{TMPDIR\}
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x000000000041ca90 in main ()
(gdb) p environ[0]
$1 = 0x7fffffffe4df "XDG_SESSION_ID=c2"
(gdb) p environ[1]
$2 = 0x7fffffffe4f1 "TERM=screen-256color"
(gdb) p environ[2]
$3 = 0x7fffffffe506 "DISPLAY=:0"
[ etc etc ]

I iterated over the entire array (till I got to an empty entry) and there was no TMPDIR. Zefram then had me verify that my EUID and my UID matched. I used both id and perl -E'say "$<:$>"' to show that they did match. Zefram then asked if LD_LIBRARY_PATH had the same problem as TMPDIR, and it did!

11:00:12      Zefram | something is cleansing the environment for security reasons

Andrew Rodland commonly known as hobbs linked me to a bug detailing and explaining the issue.

The subtle reason why Dominus didn’t figure this out in the beginning is, unlike the issue above, the binary here is not actually setuid. Instead, it has what Linux calls capabilities, which are sortav root privileges broken down into discrete pieces. Sadly that means ls -l does not show them. In fact there is no flag to pass to ls to show them, so they are easily missed.

In our developer sandboxes we add a capability to /usr/bin/perl to allow it to listen on low ports, so that developers can access their web application without needing to run Apache or some other proxy. We have plans to add a proxy for performance reasons in development anyway, but in the meantime I plan on adding some rules with iptables and removing the capability, to resolve this issue.

Here’s a funny side note to all of this: this capability has been added to our binary since 2013. Dominus ran into a problem with it Wednesday. Another coworker also ran into it two days later, for totally different reasons.

One more layer

One important thing I learned in this investigation is that there is this mostly invisible and unspoken layer: the dynamic linker. I vaguely knew that there was this thing that wires together binaries and their dynamic libraries, but I never really considered that there was more to it than that. The manpage of the dynamic linker has lots of details, but in this case the important section is:

   Secure-execution mode
       For security reasons, the effects of some environment variables are
       voided or modified if the dynamic linker determines that the binary
       should be run in secure-execution mode.  This determination is made
       by checking whether the AT_SECURE entry in the auxiliary vector (see
       getauxval(3)) has a nonzero value.  This entry may have a nonzero
       value for various reasons, including:

       *  The process's real and effective user IDs differ, or the real and
          effective group IDs differ.  This typically occurs as a result of
          executing a set-user-ID or set-group-ID program.

       *  A process with a non-root user ID executed a binary that conferred
          permitted or effective capabilities.

       *  A nonzero value may have been set by a Linux Security Module.

I have spent a little time while writing this post reading that manpage and playing with some of various options. This is kinda cool:

$ LD_DEBUG=all /bin/ls

The amount of output is significant, so I’ll leave running the above as an exercise for the reader.

Useful and (maybe?) not useful abstractions

The other thing that this investigation reinforced is my belief that not all abstractions and layers are important and useful. I have used strace countless times and almost every time I use it it tells me what I need to know (“what port is this program listening on?”, “where is this program’s config file?“, “What is this program blocking on?”) strace shows what system calls are being executed. To learn more read either some blog posts about strace or read the manpage.

Contrast that with ltrace. ltrace shows what library functions are being called. Bizarrely (to me) depending on the version of ltrace being run it can be either just a little bit shorter than the output of strace (that’s what happened while debugging above) or hugely more (on my laptop right now ltrace /usr/bin/perl -E'say $ENV{TMPDIR}' 2>&1 | wc -l is over six thousand, while the strace version is not even three hundred.) Maybe it depends on what debug symbols are installed? I don’t know. While it may be helpful to some to see this:

memmove(0x1e14e10, "print $ENV{TMPDIR}\n", 19)            = 0x1e14e10
__memcpy_chk(0x7ffd946385a1, 0x1e14c28, 5, 256)           = 0x7ffd946385a1
strlen("%ENV")                                            = 4
memchr("%ENV", ':', 4)                                    = 0
malloc(10)                                                = 0x1e16150

I suspect it is not important to most.

This is not to say that ltrace is worthless; it just is much more niche than strace. I would argue that strace is a tool worth using while writing code for almost any engineer. Yet in a decade of professional problem solving I have not been helped by ltrace.


I hope you enjoyed this. It was fun to experience and to learn about ld.so. Thanks go to all the people mentioned above. If you liked this but haven’t already read the post linked above, authored by MJD, go do that now.

Posted Thu, Jun 30, 2016

Reap slow and bloated plack workers

As mentioned before at ZipRecruiter we are trying to scale our system. Here are a couple ways we are trying to ensure we maintain good performance:

  1. Add timeouts to everything
  2. Have as many workers as possible

Timeouts

Timeouts are always important. A timeout that is too high will allow an external service to starve your users. A timeout that is too low will give up too quickly. No timeout is basically a timeout that is too high, no matter what. My previous post on this topic was about adding timeouts to MySQL. For what it’s worth, MySQL does have a default timeout, but it’s a year, so it’s what most people might call: too high.

Normally people consider timeouts for external services, but it turns out they are useful for our own servers as well. Sometimes people accidentally write code that can be slow in unusual cases, so while it’s fast 99.99% of the time, that last remaining 0.01% can be outage inducing by how much it can slow down code and consume web workers.

One way to add timeouts to code is to make everything asyncronous and tie all actions to clock events, so that you query the database and if the query doesn’t come back before the clock event, you have some kind of error. This is all well and good, but it means that you suddenly need async versions of everything, and I have yet to see universal RDBMS support for async. If you need to go that route you are almost better off rewriting all of your code in Go.

The other option is to bolt on an exteral watchdog, very similar to the MySQL reaper I wrote about last time.

More Workers

Everywhere I have worked the limiting factor for more workers has been memory. There are a few basic things you can do to use as little memory as possible. First and foremost, with most of these systems you are using some kind of preforking server, so you load up as many libraries before the fork as possible. This will allow Linux (and nearly all other Unix implementations) to share a lot of the memory between the master and the workers. On our system, in production, most workers are sharing about half a gig of memory with the master. That goes a really long way when you have tens of workers.

The other things you can do is attempt to not load lots of stuff into memory at all. Due to Perl’s memory model, when lots of memory is allocated, it is never returned to the operating system, and instead reserved for later use by the process. Instead of slurping a whole huge file into memory, just incrementally process it.

Lastly, you can add a stop gap solution that fits nicely in a reaper process. In addition to killing workers that are taking too long serving a single request, you can reap workers that have allocated too much memory.

smaps

Because of the mentioned sharing above, we really want to care more about private (that is, not shared) memory more than anything else. Killing a worker because the master has gotten larger is definitely counter productive. We can leverage Linux’s /proc/[pid]/smaps for this. The good news is that if you simply parse that file for a given worker and sum up the Private_Clean and Private_Dirty fields, you’ll end up with all of the memory that only that process has allocated. The bad news is that it can take a while. Greater than ten milliseconds seems typical; that means that adding it to the request lifecycle is a non-starter. This is why baking this into your plack reaper makes sense.

Plack Reaper

The listing below is a sample of how to make a plack reaper to resolve the above issues. It uses USR1 for timeouts, to simply kill those workers. The worker is expected to have code to intercept USR1, log what request it was serving (preferably in the access log) and exit. USR2 is instead meant to allow the worker to finish serving its current request, if there is one, and then exit after. You can leverage psgix.harakiri for that.

We also use Parallel::Scoreboard, which is what Plack::Middleware::ServerStatus::Lite uses behind the scenes.

(Note that this is incredibly simplified from what we are actually using in production. We have logging, more robust handling of many various error conditions, etc.)

#!/usr/bin/perl

use strict;
use warnings;

use Linux::Smaps;
use Parallel::Scoreboard;
use JSON 'decode_json';

my $scoreboard_dir = '/tmp/' . shift;
my $max_private    = shift;

my $scoreboard = Parallel::Scoreboard->new(
  base_dir => $scoreboard_dir,
);

while (1) {
  my $stats = $scoreboard->read_all;

  for my $pid (keys %$stats) {
    my %status = %{decode_json($stats->{$pid})};

    # undefined time will be become zero, age will be huge, should get killed
    my $age = time - $status{time};

    kill USR1 => $pid
      if $age > timeout(\%status);

    my $smaps = Linux::Smaps->new($pid);

    my $private = $smaps->private_clean + $smaps->private_dirty;
    kill USR2 => $pid
      if $private > $max_private;
  }

  sleep 1;
}

sub timeout {
  return 10 * 60 if shift->{method} eq 'POST';
  2 * 60
}

I am very pleased that we have the above running in production and increasing our effective worker count. Maybe next time I’ll blog about our awesome logging setup, or how I (though not ZipRecruiter) think strictures.pm should be considered harmful.

Until next time!

Posted Wed, Jun 29, 2016

AWS Retirement Notification Bot

If you use AWS a lot you will be familiar with the “AWS Retirement Notification” emails. At ZipRecruiter, when we send our many emails, we spin up tens of servers in the middle of the night. There was a period for a week or two where I’d wake up to one or two notifications each morning. Thankfully those servers are totally ephemeral. By the time anyone even noticed the notification the server was completely gone. Before I go further, here’s an example of the beginning of that email (the rest is static:)

Dear Amazon EC2 Customer,

We have important news about your account (AWS Account ID: XXX). EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: i-deadbeef) in the us-east-1 region. Due to this degradation, your instance could already be unreachable. After 2016-07-06 02:00 UTC your instance, which has an EBS volume as the root device, will be stopped.

Note that the identifier there is totally not useful to a human being. Every time we got this notification someone on my team would log into the AWS console, look up the server, and email the team: “the server is gone, must have been one of the email senders” or maybe “the server is an email sender and will be gone soon anyway.”

Like many good programmers I am lazy, so I thought to myself: “I should write an email bot to automate what we are doing!”

Behold:

#!/usr/bin/perl

use strict;
use warnings;

use Mail::IMAPClient;
use Email::Address;
use Email::Sender::Simple qw(sendmail);
use Data::Dumper::Concise;
use Try::Tiny;

my ($from) = Email::Address->parse('Zip Email Bot <email-bot@ziprecruiter.com>');
my $imap = Mail::IMAPClient->new(
  Server   => 'imap.gmail.com',
  User     => $from->address,
  Password => $ENV{ZIP_EMAIL_BOT_PASS},
  Ssl      => 1,
  Uid      => 1,
) or die 'Cannot connect to imap.gmail.com as ' . $from->address . ": $@";

$imap->select( $ENV{ZIP_EMAIL_BOT_FOLDER} )
  or die "Select '$ENV{ZIP_EMAIL_BOT_FOLDER}' error: ", $imap->LastError, "\n";

for my $msgid ($imap->search('ALL')) {

  require Email::MIME;
  my $e = Email::MIME->new($imap->message_string($msgid));

  # if an error happens after this the email will be forgotten
  $imap->copy( 'processed', $msgid )
    or warn "Could not copy: $@\n";

  $imap->move( '[Gmail]/Trash', $msgid )
    or die "Could not move: $@\n";
  $imap->expunge;

  my @ids = extract_instance_list($e);

  next unless @ids;

  my $email = build_reply(
    $e, Dumper(instance_data(@ids))
  );

  try {
    sendmail($email)
  } catch {
    warn "sending failed: $_";
  };
}

# We ignore stuff in the inbox, stuff we care about gets filtered into another
# folder.
$imap->select( 'INBOX' )
  or die "Select 'INBOX' error: ", $imap->LastError, "\n";

my @emails = $imap->search('ALL');

if (@emails) {
  $imap->move( '[Gmail]/Trash', \@emails )
    or warn "Failed to cleanup inbox: " . $imap->LastError . "\n";
}
$imap->expunge;

$imap->logout
  or die "Logout error: ", $imap->LastError, "\n";


# A lot of this was copy pasted from Email::Reply; I'd use it except it has some
# bugs and I was recommended to avoid it.  I sent patches to resolve the bugs and
# will consider using it directly if those are merged and released.
# -- fREW 22Mar2016
sub build_reply {
  my ($email, $body) = @_;

  my $response = Email::MIME->create;

  # Email::Reply stuff
  $response->header_str_set(From => "$from");
  $response->header_str_set(To => $email->header('From'));

  my ($msg_id) = Email::Address->parse($email->header('Message-ID'));
  $response->header_str_set('In-Reply-To' => "<$msg_id>");

  my @refs = Email::Address->parse($email->header('References'));
  @refs = Email::Address->parse($email->header('In-Reply-To'))
    unless @refs;

  push @refs, $msg_id if $msg_id;
  $response->header_str_set(References => join ' ', map "<$_>", @refs)
    if @refs;

  my @addrs = (
    Email::Address->parse($email->header('To')),
    Email::Address->parse($email->header('Cc')),
  );
  @addrs = grep { $_->address ne $from->address } @addrs;
  $response->header_str_set(Cc => join ', ', @addrs) if @addrs;

  my $subject = $email->header('Subject') || '';
  $subject = "Re: $subject" unless $subject =~ /\bRe:/i;
  $response->header_str_set(Subject => $subject);

  # generation of the body
  $response->content_type_set('text/html');
  $response->body_str_set("<pre>$body</pre>");

  $response
}

sub extract_instance_list {
  my $email = shift;

  my %ids;
  $email->walk_parts(sub {
    my $part = shift;
    return if $part->subparts; # multipart
    return if $part->header('Content-Disposition') &&
      $part->header('Content-Disposition') =~ m/attachment/;

    my $body = $part->body;

    while ($body =~ m/\b(i-[0-9a-f]{8,17})\b/gc) {
      $ids{$1} = undef;
    }
  });

  return keys %ids;
}

sub find_instance {
  my $instance_id = shift;

  my $res;
  # could infer region from the email but this is good enough
  for my $region (qw( us-east-1 us-west-1 eu-west-1 )) {
    $res = try {
      # theoretically we could fetch multiple ids at a time, but if we get the
      # "does not exist" exception we do not want it to apply to one of many
      # instances.
      _ec2($region)->DescribeInstances(InstanceIds => [$instance_id])
        ->Reservations
    } catch {
      # we don't care about this error
      die $_ unless m/does not exist/m;
      undef
    };

    last if $res;
  }

  return $res;
}

sub instance_data {
  return unless @_;
  my %ids = map { $_ => 'not found (no longer exists?)' } @_;

  for my $id (keys %ids) {
    my $res = find_instance($id);

    next unless $res;

    my ($i, $uhoh) = map @{$_->Instances}, @$res;

    next unless $i;

    warn "multiple instances found for one instance id, wtf\n" if $uhoh;

    $ids{$id} = +{
      map { $_->Key => $_->Value }
      @{$i->Tags}
    };
  }

  return \%ids;
}


my %ec2;
sub _ec2 {
  my $region = shift;

  require Paws;

  $ec2{$region} ||= Paws->service('EC2', region => $region );

  $ec2{$region}
}

There’s a lot of code there, but this is the meat of it:

my @ids = extract_instance_list($e);

next unless @ids;

my $email = build_reply(
  $e, Dumper(instance_data(@ids))
);

try {
  sendmail($email)
} catch {
  warn "sending failed: $_";
};

And then the end result is a reply-all to the original email that looks something like this:

Subject: Re: [Retirement Notification] Amazon EC2 Instance scheduled for retirement.

{
  "i-8c288e74" => {
    Level => "prod",
    Name => "send-22",
    Team => "Search"
  }
}

The code above is cool, but the end result is awesome. I don’t log into the AWS console often, and the above means I get to log in even less. This is the kind of tool I love; for the 99% case, it is quiet and simplifies all of our lives. I can see the result on my phone; I don’t have to connect to a VPN or ssh into something; it just works.

colophon

The power went out in the entire city of Santa Monica today, but I was able to work on this blog post (including seeing previews of how it would render) and access the emails that it references thanks to both my email setup and my blog setup. Hurray for software that works without the internet!

Posted Wed, Jun 22, 2016

Vim: Goto File

Vim has an awesome feature that I think is not shown off enough. It’s pretty easy to use and configure, but thankfully many languages have a sensible configuration out of the box.

Vim has this feature that opens a file when you press gf over a filename. On the face of it, it’s only sort of useful. There are a couple settings that make this feature incredibly handy.

path

First and foremost, you have to set your path. Typically when you open a Perl script or module in vim, the path is set to something like this:

  • $(pwd)
  • /usr/include
  • $PERL5LIB
  • And Perl’s default @INC

It’s a good idea to add the path of your current project, for example:

:set path+=lib

So on a typical Linux system, you can type out zlib.h and press gf over it and pull up the zlib headers. The next feature is what really makes it powerful.

suffixesadd and includeexpr

The more basic of the two options is suffixesadd. It is simply a list of suffixes to attempt to add to the filename. So in the example above, if you :set suffixesadd=.h and then type zlib and then press gf on the word, you’ll pull of the header files for zlib. That’s too basic for most modern programming environments though. Here’s the default includeexpr for me when I open a perl script:

substitute(substitute(substitute(v:fname,'::','/','g'),'->*','',''),'$','.pm','')

Let’s unpack that to make sure we see what’s going on. This may be subtly incorrect syntax, but that’s fine. The point is to communicate what is happening above.

to_open = v:fname

# replace all :: with /
to_open = substitute(to_open,'::','/','g')

# remove any method call (like ->foo)
to_open = substitute(to_open,'->*','','')

# append a .pm
to_open = substitute(to_open,'$','.pm','')

With the above we can find the filename to open. This is the default. You can do even better, if you put in a little effort. Here is an idea I’d like to try when I get some time, call a function as the expression, and in the function, if the fname contains, ->resultset(...) return the namespaced resultset. I’d need to tweak the ifsname to allow selecting weird characters, and maybe that would be more problematic than it’s worth, but it’s hard to know before you try. Could be really handy!

Even if you don’t go further with this idea, consider using gf more often. I personally use it (plus CTRL-O as a “back” command”) to browse repos and even the Perl modules they depend on.

Posted Tue, Jun 21, 2016

Staring into the Void

Monday of this week either Gmail or OfflineIMAP had a super rare transient bug and duplicated all of the emails in my inbox, twice. I had three copies of every email! It was annoying, but I figured it would be pretty easy to fix with a simple Perl script. I was right; here’s how I did it:

#!/usr/bin/env perl

use 5.24.0;
use warnings;

use Email::MIME;
use IO::All;

my $dir = shift;

my @files = io->dir($dir)->all_files;

my %message_id;

for my $file (@files) {
   my $message_id = Email::MIME->new( $file->all )->header_str('message-id');
   unless ($message_id) {
      warn "No Message-ID for $file\n";
      next;
   }

   $message_id{$message_id} ||= [];
   push $message_id{$message_id}->@*, $file->name;
}

for my $message_id (keys %message_id) {
   my ($keep, @remove) = $message_id{$message_id}->@*;

   say "# keep $keep";
   say "rm $_" for @remove;
}

After running the script above I could eyeball the output and be fairly confident that I was not accidentally deleting everything. Then I just re-ran it and piped the output to sh. Et voilà! The inbox was back to normal, and I felt good about myself.

Then I got nervous

Sometimes when you are programming, you solve real world problems, like what day you’ll get married. Other times, you’re just digging yourself out of the pit that is everything that comes with programming. This is one of those times. I’ve mentioned my email setup before, and I am still very pleased with it. But I have to admit to myself that this problem would never have happened if I were using the web interface that Gmail exposes.

See, while I can program all day, it’s not actually what I get paid to do. I get paid to solve problems, not make more of them and then fix them with code. It’s a lot of fun to write code; when you write code you are making something and you get the nearly instant gratification of seeing it work.

I think code can solve many problems, and is worth doing for sure. In fact I do think the code above is useful and was worth writing and running. But it comes really close to what I like to call “life support” code. Life support code is not code that keeps a person living. Life support code is code that hacks around bugs or lack of features or whatever else, to keep other code running.

No software is perfect; there will always be life support code, incidental complexity, lack of idempotence, and bugs. But that doesn’t mean that I can stop struggling against this fundamental truth and just write / support bad software. I will continue to attempt to improve my code and the code around me, but I think writing stuff like the above is, to some extent, a warning sign.

Don’t just mortgage your technical debt; pay it down. Fix the problems. And keep the real goal in sight; you do not exist to pour your blood into a machine: solve real problems.

Posted Thu, Jun 16, 2016

Vim Session Workflow

Nearly a year ago I started using a new vim workflow leveraging sessions. I’m very pleased with it and would love to share it with anyone who is interested.

Session Creation

This is what really made sessions work for me. Normally in vim when you store a session, which almost the entire state of the editor (all open windows, buffers, etc) you have to do it by hand, with the :mksession command. While that works, it means that you are doing that all the time. Tim Pope released a plugin called Obsession which resolves this issue.

When I use Obsession I simply run this command if I start a new project: :Obsess ~/.vvar/sessions/my-cool-thing. That will tell Obsession to automatically keep the session updated. I can then close vim, and if I need to pick up where I left off, I just load the session.

Lately, because I’m dealing with stupid kernel bugs, I have been using :mksession directly as I cannot seem to efficiently make session updating reliable.

Session Loading

I store my sessions (and really all files that vim generates to function) in a known location. The reasoning here is that I can then enumerate and select a session with a tool. I have a script that uses dmenu to display a list, but you could use one of those hip console based selectors too. Here’s my script:

#!/bin/zsh

exec gvim -S "$(find ~/.vvar/sessions -maxdepth 1 -type f | dmenu)"

That simply starts gvim with the selected session. If the session was created with Obsession, it will continue to automatically update.


This allows me to easily stop working on a given project and pick up exactly where I left off. It would be perfect if my computer would stop crashing; hopefully it’s perfect for you!

Posted Thu, Jun 9, 2016

DBI Caller Info

At ZipRecruiter we have a system for appending metadata to queries generated by DBIx::Class. About a month ago I posted about bolting timeouts onto MySQL and in the referenced code I mentioned parsing said metadata. We are depending on that metadata more and more to set accurate timeouts on certain page types.

Adding Metadata to DBI Queries

Because of our increased dependence on query metadata, I decided today that I’d look into setting the metadata at the DBI layer instead of the DBIx::Class layer. This not only makes debugging certain queries easier, but more importantly allows us to give extra grace to queries coming from certain contexts.

First we define the boilerplate packages:

package ZR::DBI;

use 5.14.0;
use warnings;

use base 'DBI';

use ZR::DBI::db;
use ZR::DBI::st;

1;
package ZR::DBI::st;

use 5.14.0;
use warnings;

use base 'DBI::st';

1;

Next we intercept the prepare method. In this example we only grab the innermost call frame. At work we not only walk backwards based on a regex on the filename; we also have a hash that adds extra data, like what controller and action are being accessed when in a web context.

package ZR::DBI::db;

use 5.14.0;
use warnings;

use base 'DBI::db';

use JSON::XS ();

sub prepare {
  my $self = shift;
  my $stmt = shift;

  my ($class, $file, $line, $sub) = caller();

  $stmt .= " -- ZR_META: " . encode_json({
    class => $class,
    file  => $file,
    line  => $line,
    sub   => $sub,
  }) . "\n";

  $self->SUPER::prepare($stmt, @_);
}

1;

Finally use the subclass:

my $dbh = DBI->connect($dsn, $user, $password, {
    RaiseError         => 1,
    AutoCommit         => 1,

    RootClass          => 'ZR::DBI',
});

The drawback of the above is that it could (and maybe is?) destroying the caching of prepared statements. In our system that doesn’t seem to be very problematic, but I suspect it depends on RDBMS and workload. Profile your system before blindly following these instructions.

Wow that’s all there is to it! I expected this to be a lot of work, but it turns out Tim Bunce had my back and made this pretty easy. It’s pretty great when something as central as database access has been standardized!

Posted Wed, Jun 8, 2016

My Custom Keyboard

A few years ago I made my own keyboard, specifically an ErgoDox. I’ve been very pleased with it in general and I have finally decided to write about it.

ErgoDox

The ErgoDox is sortav an open-source cross between the Kinesis Advantage and the Kinesis Freestyle. It’s two effectively independent halves that have a similar layout to the Advantage, especially the fact that the keys are in a matrix layout. If you don’t know what that means, think about the layout of a numpad and how the keys are directly above each other as opposed to staggered like the rest of the keyboard. That’s a matrix layout.

The other major feature of the ErgoDox is the thumb clusters. Instead of delegating various common keys like Enter and Backspace to pinky fingers, many keys are pressed by a thumb. Of course the idea is that the thumb is stronger and more flexible and thus more able to deal with consistent usage. I am not a doctor and can’t really evaluate the validity of these claims, but it’s been working for me.

The ErgoDox originally only shipped as a kit, so I ended up soldering all of the diodes, switches, etc together on a long hot day in my home office with a Weller soldering iron I borrowed from work. Of course because I had not done a lot of soldering or even electrical stuff I first soldered half of the diodes on backwards and had to reverse them. That was fun!

Firmware

My favorite thing about my keyboard is that it runs my own custom firmware. It has a number of interesting features, but the coolest one is that when the operator holds down either a or ; the following keys get remapped:

  • h becomes
  • j becomes
  • k becomes
  • l becomes
  • w becomes Ctrl + →
  • b becomes Ctrl + ←
  • y becomes Ctrl + C
  • p becomes Ctrl + V
  • d becomes Ctrl + X
  • y becomes Ctrl + Z
  • x becomes Delete

For those who can’t tell, this is basically a very minimal implementation of vi in the hardware of the keyboard. I can use this in virtually any context. The fact that keys that are not modifiers at all are able to be used in such a manner is due to the ingenuity of TMK.

Keycaps

When I bought the ErgoDox kit from MassDrop I had the option of either buying blank keycaps in a separate but concurrent drop, or somehow scrounging up my own keycaps somewhere else. After a tiny bit of research I decided to get the blank keycaps.

Zodiak

I had the idea for this part of my keyboard after having the keyboard for just a week. I’d been reading Homestuck which inspired me to use the Zodiak for the function keys (F1 through F12.)

After having the idea I emailed Signature Plastics, who make a lot of keycaps, about pricing of some really svelte keys. Note that this is three years ago so I expect their prices are different. (And really the whole keycap business has exploded so who knows.) Here was their response:

In our DCS family, the Cherry MX compatible mount is the 4U. Will all 12 of the Row 5 keycaps have the same text or different text on them? Pricing below is based on each different keycap text. As you will see our pricing is volume sensitive, so if you had a few friends that wanted the same keys as you, you would be better off going that route.

  • 1 pc $98.46 each
  • 5 pcs $20.06 each
  • 10 pcs $10.26 each
  • 15 pcs $6.99 each
  • 25 pcs $4.38 each
  • 50 pcs $2.43 each

Please note that our prices do not include shipping costs or new legend fees should the text you want not be common text. Let me know if you need anything else!

So to be absolutely clear, if I were to get a set all by myself the price would exceed a thousand dollars, for twelve keys. I decided to start the process of setting up a group buy. I’m sad to say that I can’t find the forum where I initiated that. I thought it was GeekHack but there’s no post from me before I had the Zodiak keys.

Anyway just a couple of days after I posted on the forum I got this email from Signature Plastics:

I have some good news! It appears your set has interested a couple people in our company and we have an offer we were wondering if you would consider. Signature Plastics would like to mold these keycaps and place them on our marketplace. In turn for coming up with the idea (and hopefully helping with color selection and legend size) we will offer you a set free of charge… What do you think?

Of course I was totally down. I in fact ordered an extra set myself since I ended up making two of these keyboards eventually! Here’s a screenshot of the keycaps from their store page:

Keycaps

For those who don’t know, these keys are double-shot, which means each key is actually two pieces of plastic: an orange piece (the legend,) and a black piece which contains the legend. This means that no matter how much I type on them, the legend won’t wear off even after twenty years of usage. Awesome.

Stealth

A couple of months after building the keyboard I came to the conclusion that I needed legends on all of the keys. I can touch type just fine, but when doing weird things like pressing hotkeys outside of the context of programming or writing I need the assistance of a legend. So I decided to make my own stealth keycaps.

You can see the original post on GeekHack here.

Here are the pictures from that thread:

Left

Right

Also, if you didn’t already, I recommend reading that short thread. The folks on GeekHack are super friendly, positive, and supportive. If only the rest of the internet could be half as awesome.

Miscellany

The one other little thing I’ve done to the keyboard is to add small rubber O-rings underneath each key. I have cherry blues (which are supposed to click like an IBM Model-M) but with the O-rings they keyboard is both fairly quiet and feels more gentle on my hands. A full depress of a key, though unrequired with a mechanical switch, is cushioned by the rings.


My keyboard is one of the many tools that I use on a day to day basis to get my job done. It allows me to feel more efficient and take pride in the tools that I’ve built to save myself time and hopefully pain down the road. I have long had an unfinished post in my queue about how all craftspersons should build their own tools, and I think this is a fantastic example of that fine tradition.

Go. Build.

Posted Sat, Jun 4, 2016

Serverless

A big trend lately has been the rise of “serverless” software. I’m not sure I’m the best person to define that term, but my use of the term generally revolves around avoiding a virtual machine (or a real machine I guess.) I have a server on Linode that I’ve been slowly removing services from in an effort to get more “serverless.”

It’s not about chasing fads. I am a professional software engineer and I mostly use Perl; I sorta resist change for the sake of it.

It’s mostly about the isolation of the components. As it stands today my server is a weird install of Debian where the kernel is 64 bit and the userspace is 32 bit. This was fine before, but now it means I can’t run Docker. I had hoped to migrate various parts of my own server to containers to be able to more easily move them to OVH when I eventually leave Linode, but I can’t now.

Services

I could just rebuild the server, but then all of these various services that run on my server would be down for an unknown amount of time. To make this a little more concrete, here are the major services that ran on my blog at the beginning of 2016:

  1. Blog (statically served content from Apache)
  2. Lizard Brain (Weird automation thing)
  3. IRC Client (Weechat)
  4. RSS (An install of Tiny Tiny RSS; PHP on Apache)
  5. Feeds (various proxied RSS feeds that I filter myself)
  6. Git repos (This blog and other non-public repositories)
  7. SyncThing (Open source decentralized DropBox like thing)

The above are ordered in terms of importance. If SyncThing doesn’t work for some reason, I might not even notice. If my blog is down I will be very angsty.

Blog

I’ve already posted about when I moved my blog off Linode. That’s been a great success for me. I am pleased that this blog is much more stable than it was before; it’s incredibly secure, despite the fact that it’s “on someone else’s computer;” and it’s fast and cheap!

Feeds

After winning a sweet skateboard from Heroku I decided to try out their software. It’s pretty great! The general idea is that you write some kind of web based app, and it will get run in a container on demand by Heroku, and after a period of inactivity, the app will be shut down.

This is a perfect way for my RSS proxy to run, and it simplified a lot of stuff. I had written code to automatically deploy when I push to GitHub. Heroku already does that. I never took care of automating the installation of deps, but Heroku (or really miyagawa) did.

While I had certificates automatically getting created by LetsEncrypt, Heroku provides the same functionality and I will never need to baby-sit it.

And finally, because my RSS proxy is so light (accessed a few times a day) it ends up being free. Awesome. Thanks Heroku.

AWS Lambda

I originally tried using Lambda for this, but it required a rewrite and I am depending on some non-trivial infrastructural dependencies here. While I would have loved to port my application to Python and have it run for super cheap on AWS Lambda, it just was not a real option without more porting than I am prepared to do right now.

RSS and Git Repos

Tiny Tiny RSS is software that I very much have a love/hate relationship with. Due to the way the community works, I was always a little nervous about using it. After reading a blog post by Filippo Valsorda about Piwik I decided to try out Sandstorm.io on the Oasis. Sandstorm.io is a lot like Heroku, but it’s more geared toward hosting open source software for individuals, with a strong emphasis on security.

You know that friend you have who is a teacher and likes to blog about soccer? Do you really want that friend installing WordPress on a server? You do not. If that friend had an Oasis account, they could use the WordPress grain and almost certainly never get hacked.

I decided to try using Oasis to host my RSS reader and so far it has been very nice. I had one other friend using my original RSS instance (it was in multiuser mode) and he seems to have had no issues with using Oasis either. This is great; I now have a frustrating to maintain piece of software off of my server and also I’m not maintaining it for two. What a load off!

Oasis also has a grain for hosting a git repo, so I have migrated the storage of the source repo of this blog to the Oasis. That was a fairly painless process, but one thing to realize is that each grain is completely isolated, so when you set up a git repo grain it hosts just the one repo. If you have ten repos, you’d be using ten grains. That would be enough that you’d end up paying much more for your git repos.

I’ll probably move my Piwik hosting to the Oasis as well.

Oh also, it’s lightweight enough that it’s free! Thanks Oasis.

Lizard Brain and IRC Client

Lizard Brain is very much a tool that is glued into the guts of a Unix system. One of its core components is atd. As of today, Sandstorm has no scheduler that would allow LB to run there. Similarly, while Heroku does have a scheduler, its granularity is terrible and it’s much more like cron (it’s periodic) than atd (a specific event in time.) Amazon does have scheduled events for Lambda, but unlike Heroku and Sandstorm, that would require a complete rewrite in Python, Java, or JavaScript. I suspect I will rewrite in Python; it’s only about 800 lines, but it would be nice if I didn’t have to.

Another option would be for me to create my own atd, but then I’d have it running in a VM somewhere and if I have a VM running somewhere I have a lot less motivation to move every little service off of my current VM.

A much harder service is IRC. I use my VM as an IRC client so that I will always have logs of conversations that happened when I was away. Over time this has gotten less and less important, but there are still a few people who will reach out to me while I’m still asleep and I’m happy to respond when I’m back. As of today I do not see a good replacement for a full VM just for IRC. I may try to write some kind of thing to put SSH + Weechat in a grain to run on Sandstorm.io, but it seems like a lot of work.

An alternate option, which I do sortav like, is finding some IRC client that runs in the browser and also has an API, so I can use it from my phone, but also have a terminal interface.

The good news is that my Linode will eventually “expire” and I’ll probably get a T2 Nano EC2 instance, which costs about $2-4/month and is big enough (500 mB of RAM) to host an IRC Client. Even on my current Linode I’m using only 750 mB of ram and if you exclude MySQL (used for TTRSS, still haven’t uninstalled it) and SyncThing it’s suddenly less than 500 mB. Cool!

SyncThing

SyncThing is cool, but it’s not a critical enough part of my setup to require a VM. I am likely to just stop using it since I’ve gone all the way and gotten a paid account for DropBox.

Motivations

A lot of the above are specifics that are almost worthless to most of you. There are real reasons to move to a serverless setup, and I think they are reasons that everyone can appreciate.

Security

Software is consistently and constantly shown to be insecure. Engineers work hard to make good software, but it seems almost impossible for sufficiently complex software to be secure. I will admit that all of the services discussed here are also software, but because of their very structure the user is protected from a huge number of attacks.

Here’s a very simple example: on the Oasis, I have a MySQL instance inside of the TTRSS grain. On my Linode the MySQL server could potentially be misconfigured to be listening on a public interface, maybe because some PHP application installer did that. On the Oasis that’s not even possible, due to the routing of the containers.

Similarly, on Heroku, if there were some crazy kernel bug that needed to be resolved, because my application is getting spun down all the time, there are plenty of chances to reboot the underlying virtual machines without me even noticing.

Isolation

Isolation is a combination of a reliability and security feature. When it comes to security it means that if my blog were to get hacked, my TTRSS instance is completely unaffected. Now I have to admit this is a tiny bit of a straw man, because if I set up each of my services as separate users they’d be fairly well isolated. I didn’t do that though because that’s a hassle.

The reliability part of isolation is a lot more considerable though. If I tweak the Apache site config for TTRSS and run /etc/init.d/apache restart and had a syntax error, all of the sites being hosted on my machine go down till I fix the issue. While I’ve learned various ways to ensure that does not happen, “be careful” is a really stupid way to ensure reliability.

Cost

I make enough money to pay for a $20/mo Linode, but it just seems like a waste of overall money that could be put to better uses. Without a ton of effort I can cut my total spend in half, and I suspect drop to about %10. As mentioned already in the past, my blog is costing less than a dime a month and is rock-solid.

Problems

Nothing is perfect though. While I am mostly sold on the serverless phenomenon, there are some issues that I think need solving before it’s an unconditional win.

Storage (RDBMS etc)

This sorta blows my mind. With the exception of Sandstorm.io, which is meant for small amounts of users for a given application, no one really offers a cheap database. Heroku has a free database option that I couldn’t have used with my RSS reader, and the for-pay option would cost about half what I pay for my VM, just for the database.

Similarly AWS offers RDS, but that’s really just the cost of an EC2 VM, so at the barest that would be a consistent $2/mo. If you were willing to completely rewrite your application you might be able to get by using DyanomDB, but in my experience using it at work it can be very frustrating to try to tune for.

I really think that someone needs to come in and do what Lamdba did for code or DyanomDB did for KV stores, but for a traditional database. Basically as it stands today if you have a database that is idle, you pay the same price as you would for a database that is pegging it’s CPU. I want a traditional database that is billed based on usage.

Billing Models

Speaking of billing a database based on usage, more things need to be billed based on usage! I am a huge fan of most of the billing models on AWS, where you end up paying for what you use. For someone self hosting for personal consumption this almost always means that whatever you are doing will cost less than any server you could build. I would gladly pay for my Oasis usage, but a jump from free to $9 is just enough for me to instead change my behaviour and instead spend that money elsewhere.

If someone who works on Sandstorm.io is reading this and cares: I would gladly pay hourly per grain.

I have not yet used enough of Heroku to need to use the for pay option there, but it looks like I could probably use it fairly cheaply.

haters

Of course there will be some people who read this who think that running on anything but your own server is foolish. I wonder if those people run directly on the metal, or just assume that all of the Xen security bugs have been found. I wonder if those people regularly update their software for security patches and know to restart all of the various components that need to be restarted. I wonder if those people value their own time and money.


Hopefully before July I will only be using my server for IRC and Lizard Brain. There’s no rush to migrate since my Linode has almost 10 months before a rebill cycle. I do expect to test how well a T2 Nano works for my goals in the meantime though, so that I can easily pull the trigger when the time comes.

Posted Wed, Jun 1, 2016

Iterating over Chunks of a Diff in Vim

Every now and then at work I’ll make broad, sweeping changes in the codebase. The one I did recently was replacing all instances of print STDERR "foo\n" with warn "foo\n". There were about 160 instances in all that I changed. After discussing more with my boss, we discussed that instead of blindly replacing all those print statements with warns (which, for those who don’t know, are easier to intercept and log) we should just log to the right log level.

Enter Quickfix

Quickfix sounds like some kind of bad guy from a slasher movie to me, but it’s actualy a super handy feature in Vim. Here’s what the manual says:

Vim has a special mode to speedup the edit-compile-edit cycle. This is inspired by the quickfix option of the Manx’s Aztec C compiler on the Amiga. The idea is to save the error messages from the compiler in a file and use Vim to jump to the errors one by one. You can examine each problem and fix it, without having to remember all the error messages.

More concretely, the quickfix commands end up giving the user a list of locations. I tend to use the quickfix list most commonly with Fugitive. You can run the command :Ggrep foo and the quickfix list will contain all of the lines that git found containing foo. Then, to iterate over those locations you can use :cnext, :cprev, :cwindow, and many others, to interact with the list.

I have wanted a way to populate the quickfix list with the locations of all of the chunks that are in the current modified files for a long time, and this week I decided to finally do it.

First off, I wrote a little tool to parse diffs and output locations:

#!/usr/bin/env perl

use strict;
use warnings;

my $filename;
my $line;
my $offset = 0;
my $printed = 0;
while (<STDIN>) {
   if (m(^\+\+\+ b/(.*)$)) {
      $printed = 0;
      $filename = $1;
   } elsif (m(^@@ -\d+(?:,\d+)? \+(\d+))) {
      $line = $1;
      $offset = 0;
      $printed = 0;
   } elsif (m(^\+(.*)$)) {
      my $data = $1 || '-';
      print "$filename:" . ($offset + $line) . ":$data\n"
         unless $printed;
      $offset++;
      $printed = 1;
   } elsif (m(^ )) {
      $printed = 0;
      $offset++;
   }
}

The general usage is something like: git diff | diff-hunk-list, and the output will be something like:

app/lib/ZR/Plack/Middleware/AccessLog.pm:195:  local $SIG{USR1} = sub {
bin/zr-plack-reaper:22:-
bin/zr-plack-reaper:29:sub timeout { 120 }

The end result is a location for each new set of lines in a given diff. That means that deleted lines will not be included with this tool. Another tool or more options for this tool would have to be made for that functionality.

Then, I added the following to my vimrc:

command Gdiffs cexpr system('git diff \| diff-hunk-list')

So now I can simply run :Gdiffs and iterate over all of my changes, possibly tweaking them along the way!

Super Secret Bonus Content

The Quickfix is great, but there are a couple other things that I think really round out the functionality.

First: the Quickfix is global per session, so if you do :Gdiffs and then :Ggrep to refer to some other code, you’ve blown away the original quickfix list. There’s another list called the location list, which is scoped to a window. Also very useful; tends to use commands that start with l instead of c.

Second: There is another Tim Pope plugin called unimpaired which adds a ton of useful mappings; which includes [q and ]q to go back and forth in the quickfix, and [l and ]l to go back and forth in the location list. Please realize that the plugin does way more than just those two things, but I do use it for those the most.

Posted Wed, May 25, 2016

OSCON 2016

ZipRecruiter, where I work, generously pays for each engineer to go at least one conference a year. I have gone to YAPC every year since 2009 and would not skip it, except my wife is pregnant with our second child and will be due much too close to this year’s YAPC (or should I say instead: The Perl Conference?) for me to go.

There were a lot of conferences that I wanted to check out; PyCon, Monitorama, etc etc, but OSCON was the only one that I could seem to make work out with my schedule. I can only really compare OSCON to YAPC and to a lesser extent SCALE and the one time I went to the ExtJS conference (before it was called Sencha,) so my comparisons may be a little weird.

Something Corporate

OSCON is a super corporate conference, given that it’s name includes Open Source. For the most part this is fine; it means that there is a huge amount of swag (more on that later,) lots of networking to be done, and many free meals. On the other hand OSCON is crazy expensive; I would argue not worth the price. I got the lowest tier, since my wife didn’t want me to be gone for the full four days (and probably six including travel,) and it cost me a whopping twelve hundred dollars. Of course ZipRecruiter reimbursed me, but for those who are used to this, YAPC costs $200 max, normally.

On top of that there were what are called “sponsored talks.” I was unfamiliar with this concept but the basic idea is that a company can pay a lot of money and be guaranteed a slot, which is probably a keynote, to sortav shill their wares. I wouldn’t mind this if it weren’t for the fact that these talks, as far as I could tell, were universally bad. The one that stands out the most was from IBM, with this awesome line (paraphrased:)

Oh if you don’t use Swagger.io you’re not really an engineer. Maybe go back and read some more Knuth.

Swag

At YAPC you tend to get 1-3 shirts, some round tuits, and maybe some stickers. At OSCON I avoided shirts and ended up with six; I got a pair of socks, a metal bottle, a billion pretty awesome stickers, a coloring book, three stress toys, and a THOUSAND DOLLAR SKATEBOARD. To clarify, not everyone got the skateboard; the deal was that you had to get a Heroku account (get socks!) run a node app on your laptop (get shirt!) and then push it up to Heroku (get entered into drawing!) Most people gave up at step two because they had tablets or something, but I did it between talks because that all was super easy on my laptop. I actually was third in line after the drawing, but first and second never showed. Awesome!

The Hallway Track

For me the best part of any conference is what is lovingly called “the hallway track.” The idea is that the hallway, where socializing and networking happen, is equally important to all the other tracks (like DevOps, containers, or whatever.) I really enjoy YAPC’s hallway track, though a non-trivial reason is that I already have many friends in the Perl and (surprisingly distinct) YAPC world. On top of that YAPC tends to be in places that are very walkable, so it’s easy to go to a nice restaurant or bar with new friends.

I was pleasantly surprised by the OSCON hallway track. It was not as good as YAPC’s, but it was still pretty awesome. Here are my anecdotes:

Day 1 (Wed)

At lunch I hung out with Paul Fenwick and a few other people, which was pretty good. Chatting with Paul is always great and of course we ended up talking about ExoBrain and my silly little pseudoclone: Lizard Brain.

At dinner I decided to take a note from Fitz Elliot’s book, who once approached me after I did a talk and hung out with me a lot during the conference. I had a lot of good conversations with Fitz and I figured that maybe I could be half as cool as him and do the same thing. The last talk I went to was about machine learning and the speaker, Andy Kitchen, swerved into philosophy a few times, so I figured we’d have a good time and get along if I didn’t freak him out too much by asking if we could hang out. I was right, we (him, his partner Laura Summers, a couple other guys, and I) ended up going to a restaurant and just having a generally good time. It was pretty great.

Day 2 (Thu)

At lunch on Thursday I decided to sit at the Perl table and see who showed up. Randal Schwartz, who I often work with, was there, which was fun. A few other people were there. Todd Rinaldo springs to mind. I’ve spoken to him before, but this time we found an important common ground in trying to reduce memory footprints. I hope to collaborate with him to an extent and publish our results.

Dinner was pretty awesome. I considered doing the same thing I did on Wednesday, but I thought it’d be hugely weird to ask the girl who did the last talk I saw if she wanted to get dinner. That means something else, usually. So I went to the main area where people were sorta congregating and went to greet some of the Perl people that I recognized (Liz, Wendy, David H. Alder.) They were going to Max’s Wine Bar and ended up inviting me, and another girl who I sadly cannot remember the name of. Larry Wall (who invented Perl,) and his wife Gloria and one of his sons joined us, which was pretty fun. At the end of dinner (after I shared an amazing pair of deserts with Gloria) Larry and Wendy fought over who would pay the bill, and Larry won. This is always pretty humbling and fun. The punchline was that the girl who came with us didn’t know who Larry was, because she was mostly acquainted with Ruby. When Wendy told her there were many pictures taken. It was great.

Day 3 (Fri)

Most of Friday I tried to chill and recuperate. I basically slept, packed, went downtown to get lunch and coffee, and then waited for a cab to the airport. Then when I got to the airport I was noticed by another OSCON attendee (Julie Gunderson) because I was carrying the giant branded skateboard. She was hanging out with AJ Bowen and Jérôme Petazzoni, and they were cool with me tagging along with them to get a meal before we boarded the plane. It’s pretty cool that we were able to have a brief last hurrah after the conference was completely over.

Perl

One thing that I was pretty disappointed in was the general reaction when I mentioned that I use Perl. I have plenty of friends in Texas who think poorly of Perl, but I had assumed that was because they mostly worked on closed source software. The fact that a conference that was originally called The Perl Conference would end up encouraging such an anti-Perl attitude is very disheartening.

Don’t get me wrong, Perl is not perfect, but linguistic rivalries only alienate people. I would much rather you tell me some exciting thing you did with Ruby than say “ugh, why would someone build a startup on Perl?” I have a post in the queue about this, so I won’t say a lot more about this. If you happen to read this and are a hater, maybe don’t be a hater.


Overall the conference was a success for me. If I had to choose between a large conference like OSCON and a small conference like YAPC, I’d choose the latter. At some point I’d like to try out the crazy middle ground of something like DefCon where it’s grass roots but not corporate. Maybe in a few years!

Posted Fri, May 20, 2016

Faster DBI Profiling

Nearly two months ago I blogged about how to do profiling with DBI, which of course was about the same time we did this at work.

At the same time there was a non-trivial slowdown in some pages on the application. I spent some time trying to figure out why, but never made any real progress. On Monday of this week Aaron Hopkins pointed out that we had set $DBI::Profile::ON_DESTROY_DUMP to an empty code reference. If you take a peak at the code you’ll see that setting this to a coderef is much less efficient than it could be.

So the short solution is to set $DBI::Profile::ON_DESTROY_DUMP to false.

A better solution is to avoid the global entirely by making a simple subclass of DBI::Profile. Here’s how I did it:

package MyApp::DBI::Profile;

use strict;
use warnings;

use parent 'DBI::Profile';

sub DESTROY {}

1;

This correctly causes the destructor to do nothing, and allows us to avoid setting globals. If you are profiling all of your queries like we are, you really should do this.

Posted Wed, May 18, 2016

Setting up Let's Encrypt and Piwik

Late last week I decided that I wanted to set up Piwik on my blog. I’ll go into how to do that later in the post, but first I ran into a frustraing snag: I needed another TLS certificate. Normally I use StartSSL, because I’ve used them in the past, and I actually started to attempt to go down the path of getting another certificate through them this time, but I ran into technical difficulties that aren’t interesting enough to go into.

Let’s Encrypt

I decided to finally bite the bullet and switch to Let’s Encrypt. I’d looked into setting it up before but the default client was sorta heavyweight, needing a lot of dependencies installed and maybe more importantly it didn’t support Apache. On Twitter at some point I read about acmetool, a much more predictable tool with automated updating of certificates built in. Here’s how I set it up:

Install acmetool

I’m on Debian, but since it’s a static binary, as the acmetool documentation states, the Ubuntu repository also works:

sudo sh -c \
  "echo 'deb http://ppa.launchpad.net/hlandau/rhea/ubuntu xenial main' > \
      /etc/apt/sources.list.d/rhea.list"
sudo apt-key adv \
  --keyserver keyserver.ubuntu.com \
  --recv-keys 9862409EF124EC763B84972FF5AC9651EDB58DFA
sudo apt-get update
sudo apt-get install acmetool

Configure acmetool

First I ran sudo acmetool quickstart. My answers were:

  • 1, to use the Live Let’s Encrypt servers
  • 2, to use the PROXY challenge requests

And I think it asked to install a cronjob, which I said yes to.

Get some certs

This is assuming you have your DNS configured so that your hostname resolves to your IP address. Once that’s the case you should simply be able to run this command to get some certs:

sudo acmetool want \
  piwik.afoolishmanifesto.com \
     st.afoolishmanifesto.com \
    rss.afoolishmanifesto.com

Configure Apache with the certs

There were a couple little things I had to do to get multiple certificates (SNI) working on my server. First off, /etc/apache2/ports.conf needs to look like this:

NameVirtualHost *:443
Listen 443

Note that my server is TLS only; if you support unencrypted connections obviously the above will be different.

Next, edit each site that you are enabling. So for example, my /etc/apache2/sites-availabe/piwik looks like this:

<VirtualHost *:443>
        ServerName piwik.afoolishmanifesto.com
        ServerAdmin webmaster@localhost

        SSLEngine on
        SSLCertificateFile      /var/lib/acme/live/piwik.afoolishmanifesto.com/cert
        SSLCertificateKeyFile   /var/lib/acme/live/piwik.afoolishmanifesto.com/privkey
        SSLCertificateChainFile /var/lib/acme/live/piwik.afoolishmanifesto.com/chain

        ProxyPass "/.well-known/acme-challenge" "http://127.0.0.1:402/.well-known/acme-challenge"
        DocumentRoot /var/www/piwik
        <Location />
                Order allow,deny
                allow from all
        </Location>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        LogLevel warn

        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

I really like that the certificate files end up in a place that is predictable and clear.

After doing the above configuration, you should be able to restart apache (sudo /etc/init.d/apache2 restart), access your website, and see it using a freshly minted Let’s Encrypt certificate.

Configure auto-renewal

Let’s Encrypt certificates do not last very long at all. Normally a cheap or free certificate will last a year, a more expensive one will last two years, and some special expensive EV certs can last longer, with I think a normal max of five? The Let’s Encrypt ones last ninety days. With an expiration so often, automation is a must. This is where acmetool really shines. If you allowed it to install a cronjob it will periodically renew certificates. That’s all well and good but your server needs to be informed that a new certificate has been installed. The simplest way to do this is to edit the /etc/default/acme-reload file and set SERVICES to apache2.

Piwik

The initiator of all of the above was to set up Piwik. If you haven’t heard of Piwik, it’s basically a locally hosted Google Analytics. The main benefit being that people who use various ad-blockers and privacy tools will not be blocking you, and reasonably so as your analytics will not leave your server.

The install was fairly straight forward. The main thing I did was follow the instructions here and then when it came to the MySQL step I ran the following commands as the mysql root user (mysql -u root -p):

CREATE DATABASE piwik;
CREATE USER 'piwik'@'localhost' IDENTIFIED BY 'somepassword';
use piwik;
GRANT ALL PRIVILEGES ON *.* TO 'piwik'@'localhost';

So now that I have Piwik I can see interesting information much more easily than before, where I wrote my own little tools to parse access logs. Pretty neat!

Posted Sat, May 14, 2016

Rage Inducing Bugs

I have run into a lot of bugs lately. Maybe it’s actually a normal amount, but these bugs, especially taken together, have caused me quite a bit of rage. Writing is an outlet for me and at the very least you can all enjoy the show, so here goes!

X11 Text Thing

I tweeted about this one a few days ago. The gist is that, sometimes, when going back and forth from suspend, font data in video memory gets corrupted. I have a theory that it has to do with switching from X11 to DRI (the old school TTYs), but it is not the most reproducable thing in the world, so this is where I’ve had to leave it.

Firefox

I reported a bug against Firefox recently about overlay windows not getting shown. There is a workaround for this, and the Firefox team (or at least a Firefox individual, Karl Tomlinson) has been willing to look at my errors and have a dialog with me. I have a sinking feeling that this could be a kernel driver bug or maybe a GTK3 bug, but I have no idea how to verify that.

Vim SIGBUS

Vim has been crashing on my computer for a while now. I turned on coredumps for gvim only so that I could repro it about three weeks ago, and I finally got the coveted core yesterday. I dutifully inspected the core dump with gdb and got this:

Program terminated with signal SIGBUS, Bus error.
#0  0x00007fa650515757 in ?? ()

Worthless. I knew I needed debugging symbols, but it turns out there is no vim-dbg. A while ago (like, eight years ago) Debian (and thus Ubuntu) started storing debugging symbols in a completely separate repository. Thankfully a Debian developer, Niels Thykier, was kind enough to point this out to me that I was able to install the debugging symbols. If you want to do that yourself you can follow instructions here, but I have to warn you, you will get errors, because I don’t think Ubuntu has really put much effort into this working well.

After installing the debugging symbols I got this much more useful backtrace:

#0  0x00007fa650515757 in kill () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000555fad98c273 in may_core_dump () at os_unix.c:3297
#2  0x0000555fad98dd20 in may_core_dump () at os_unix.c:3266
#3  mch_exit (r=1) at os_unix.c:3263
#4  <signal handler called>
#5  in_id_list (cur_si=<optimized out>, cur_si@entry=0x555fb0591700, list=0x6578655f3931313e, 
    ssp=ssp@entry=0x555faf7497a0, contained=0) at syntax.c:6193
#6  0x0000555fad9fb902 in syn_current_attr (syncing=syncing@entry=0, displaying=displaying@entry=0, 
    can_spell=can_spell@entry=0x0, keep_state=keep_state@entry=0) at syntax.c:2090
#7  0x0000555fad9fc1b4 in syn_finish_line (syncing=syncing@entry=0) at syntax.c:1781
#8  0x0000555fad9fcd3f in syn_finish_line (syncing=0) at syntax.c:758
#9  syntax_start (wp=0x555faf633720, lnum=3250) at syntax.c:536
#10 0x0000555fad9fcf45 in syn_get_foldlevel (wp=0x555faf633720, lnum=lnum@entry=3250) at syntax.c:6546
#11 0x0000555fad9167e9 in foldlevelSyntax (flp=0x7ffe2b90beb0) at fold.c:3222
#12 0x0000555fad917fe8 in foldUpdateIEMSRecurse (gap=gap@entry=0x555faf633828, level=level@entry=1, 
    startlnum=startlnum@entry=1, flp=flp@entry=0x7ffe2b90beb0, 
    getlevel=getlevel@entry=0x555fad9167a0 <foldlevelSyntax>, bot=bot@entry=7532, topflags=2)
    at fold.c:2652
#13 0x0000555fad918dbf in foldUpdateIEMS (bot=7532, top=1, wp=0x555faf633720) at fold.c:2292
#14 foldUpdate (wp=wp@entry=0x555faf633720, top=top@entry=1, bot=bot@entry=2147483647) at fold.c:835
#15 0x0000555fad919123 in checkupdate (wp=wp@entry=0x555faf633720) at fold.c:1187
#16 0x0000555fad91936a in checkupdate (wp=0x555faf633720) at fold.c:217
#17 hasFoldingWin (win=0x555faf633720, lnum=5591, firstp=0x555faf633798, lastp=lastp@entry=0x0, 
    cache=cache@entry=1, infop=infop@entry=0x0) at fold.c:158
#18 0x0000555fad91942e in hasFolding (lnum=<optimized out>, firstp=<optimized out>, 
    lastp=lastp@entry=0x0) at fold.c:133
#19 0x0000555fad959c3e in update_topline () at move.c:291
#20 0x0000555fad9118ee in buf_reload (buf=buf@entry=0x555faf25e210, orig_mode=orig_mode@entry=33204)
    at fileio.c:7155
#21 0x0000555fad911d0c in buf_check_timestamp (buf=buf@entry=0x555faf25e210, focus=focus@entry=1)
    at fileio.c:6997
#22 0x0000555fad912422 in check_timestamps (focus=1) at fileio.c:6664
#23 0x0000555fada1091b in ui_focus_change (in_focus=<optimized out>) at ui.c:3203
#24 0x0000555fad91fd96 in vgetc () at getchar.c:1670
#25 0x0000555fad920019 in safe_vgetc () at getchar.c:1801
#26 0x0000555fad96e775 in normal_cmd (oap=0x7ffe2b90c440, toplevel=1) at normal.c:627
#27 0x0000555fada5d665 in main_loop (cmdwin=0, noexmode=0) at main.c:1359
#28 0x0000555fad88d21d in main (argc=<optimized out>, argv=<optimized out>) at main.c:1051

I am already part of the vim mailing list, so I sent an email and see responses (though sadly not CC’d to me) as I write this post, so hopefully this will be resolved soon.

Linux Kernel Bugs

I found a bug in the Linux Kernel, probably related to the nvidia drivers, but I’m not totally sure. I’d love for this to get resolved, though reporting kernel bugs to Ubuntu has not gone well for me in the past.

Vim sessions

The kernel bug above causes the computer to crash during xrandr events; this means that I end up with vim writing a fresh new session file during the event (thanks to the excellent Obsession by Tim Pope) and the session file getting hopelessly corrupted, because it fails midwrite.

I foolishly mentioned this on the #vim channel on freenode and was reminded how often IRC channels are actually unrelated to aptitude. The people in the channel seemed to think that if the kernel crashes, there is nothing that can be done by a program to avoid losing data. I will argue that while it is hard, it is not impossible. The most basic thing that can and should be done is:

  1. Write to a tempfile
  2. Rename the tempfile to the final file

This should be atomic and safe. There are many ways that dealing with files can go wrong, but to believe it is impossible to protect against them is unimpressive, to say the least.

I will likely submit the above as a proper bug to the Vim team tomorrow. In the meantime this must also be done in Obsession, and I have submitted a small patch to do what I outlined above. I’m battle testing it now and will know soon if it resolves the problem.


I feel better. At the very least I’ve submitted bugs, and in one of the most annoying cases, been able to submit a patch. When you run into a bug, why not do the maintainer a solid and report it? And if you can, fix it!

Posted Tue, May 10, 2016

Putting MySQL in Timeout

At work we are working hard to scale our service to serve more users and have fewer outages. Exciting times!

One of the main problems we’ve had since I arrived is that MySQL 5.6 doesn’t really support query timeouts. It has stall timeouts, but if a query takes too long there’s not a great way to cancel it. I worked on resolving this a few months ago and was disapointed that I couldn’t seem to come up with a good solution that was simple enough to not scare me.

A couple weeks ago we hired a new architect (Aaron Hopkins) and he, along with some ideas from my boss, Bill Hamlin, came up with a pretty elegant and simple way to tackle this.

The solution is in two parts, the client side, and a reaper. On the client you simply set a stall timeout; this example is Perl but any MySQL driver should expose these connection options:

my $dbh = DBI->connect('dbd:mysql:...', 'zr', $password, {
   mysql_read_timeout  => 2 * 60,
   mysql_write_timeout => 2 * 60,
   ...,
})

This will at the very least cause the client to stop waiting if the database disappears. If the client is doing a query and pulling rows down over the course of 10 minutes, but is getting a new row every 30s, this will not help.

To resolve the above problem, we have a simple reaper script:

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use JSON;
use Linux::Proc::Net::TCP;
use Sys::Hostname;

my $actual_host = hostname();

my $max_timeout = 2 * 24 * 60 * 60;
$max_timeout = 2 * 60 * 60 if $actual_host eq 'db-master';

my $dbh = DBI->connect(
  'dbi:mysql:host=localhost',
  'root',
  $ENV{MYSQL_PWD},
  {
    RaiseError => 1
    mysql_read_timeout => 30,
    mysql_write_timeout => 30,
  },
);

my $sql = <<'SQL';
SELECT pl.id, pl.host, pl.time, pl.info
  FROM information_schema.processlist pl
 WHERE pl.command NOT IN ('Sleep', 'Binlog Dump') AND
       pl.user NOT IN ('root', 'system user') AND
       pl.time >= 2 * 60
SQL

while (1) {
  my $sth = $dbh->prepare_cached($sql);
  $sth->execute;

  my $connections;

  while (my $row = $sth->fetchrow_hashref) {
    kill_query($row, 'max-timeout') if $row->{time} >= $max_timeout;

    if (my ($json) = ($row->{info} =~ m/ZR_META:\s+(.*)$/)) {
      my $data = decode_json($json);

      kill_query($row, 'web-timeout') if $data->{catalyst_app};
    }

    $connections ||= live_connections();
    kill_query($row, 'zombie') unless $connections->{$row->{host}}
  }

  sleep 1;
}

sub kill_query {
  my ($row, $reason) = @_;
  no warnings 'exiting';

  warn sprintf "killing «%s», reason %s\n", $row->{info}, $reason;
  $dbh->do("KILL CONNECTION ?", undef, $row->{id}) unless $opt->noaction;
  next;
}

sub live_connections {
  my $table = Linux::Proc::Net::TCP->read;

  return +{
    map { $_->rem_address . ':' . $_->local_port => 1 }
    grep $_->st eq 'ESTABLISHED',
    @$table
  }
}

There are a lot of subtle details in the above script; so I’ll do a little bit of exposition. First off, the reaper runs directly on the database server. We define the absolute maximum timeout based on the hostname of the machine, with 2 days being the timeout for reporting and read-only minions, and 2 hours being the timeout for the master.

The SQL query grabs all running tasks, but ignores a certain set of tasks. Importantly, we have to whitelist a couple users because one (root) is where extremely long running DDL takes place and the other (system user) is doing replication, basically constantly.

We iterate over the returned queries, immediately killing those that took longer than the maximum timeout. Any queries that our ORM (DBIx::Class) generated have a little bit of logging appended as a comment with JSON in it. We can use that to tweak the timeout further; initially by choking down web requests to a shorter timeout, and later we’ll likely allow users to set a custom timeout directly in that comment.

Finally, we kill queries whose client has given up the ghost. I did a test a while ago where I started a query and then killed the script doing the query, and I could see that MySQL kept running the query. I can only assume that this is because it could have been some kind of long running UPDATE or something. I expect the timeouts will be the main cause of query reaping, but this is a nice stopgap that could pare down some pointless crashed queries.

I am very pleased with this solution. I even think that if we eventually switch to Aurora all except the zombie checking will continue to work.

Posted Sun, May 8, 2016

A new Join Prune in DBIx::Class

At work a coworker and I recently went on a rampage cleaning up our git branches. Part of that means I need to clean up my own small pile of unmerged work. One of those branches is an unmerged change to our subclass of the DBIx::Class Storage Layer to add a new kind of join prune.

If you didn’t know, good databases can avoid doing joins at all by looking at the query and seeing where (or if) the joined in table was used at all. DBIx::Class does the same thing, for databases that do not have such tooling built in. In fact there was a time when it could prune certain kinds of joins that even the lauded PostgreSQL could not. That may no longer be the case though.

The rest of what follows in this blog post is a very slightly tidied up commit message of the original branch. Enjoy!


Recently Craig Glendenning found a query in the ZR codebase that was using significant resources; the main problem was that it included a relationship but didn’t need to. We fixed the query, but I was confused because DBIx::Class has a built in join pruner and I expected it to have transparently solved this issue.

It turns out we found a new case where the join pruner can apply!

If you have a query that matches all of the following conditions:

  • a relationship is joined with a LEFT JOIN
  • that relationship is not in the WHERE
  • that relationship is not in the SELECT
  • the query is limited to one row

You can remove the matching relationship. The WHERE and SELECT conditions should be obvious: if a relationship is used in the WHERE clause, you need it to be joined for the WHERE clause to be able to match against the column. Similarly, for the SELECT clause the relationship must be included so that the column can actually be referenced in the SELECT clause.

The one row and LEFT JOIN conditions are more subtle; but basically consider this case:

You have a query with a limit of 2 and you join in a relationship that has zero or more related rows. If you get back zero rows for all of the relationships, the root table will basically be returned and you’ll just get the first two rows from that table. But consider if you got back two related rows for each row in the root table: you would only get back the first row from the root table.

Similarly, the reason that LEFT is specified is that if it were a standard INNER JOIN, the relationship will filter the root table based on relationship.

If you specify a single row, when a relationship is LEFT it is not filtering the root table, and the “exploding” nature of relationships does not apply, so you will always get the same row.


I’ve pushed the change that adds the new join prune to GitHub, and notified the current maintainer of DBIx::Class in the hopes that it can get merged in for everyone to enjoy.

Posted Fri, Apr 29, 2016