A new Join Prune in DBIx::Class

At work a coworker and I recently went on a rampage cleaning up our git branches. Part of that means I need to clean up my own small pile of unmerged work. One of those branches is an unmerged change to our subclass of the DBIx::Class Storage Layer to add a new kind of join prune.

If you didn’t know, good databases can avoid doing joins at all by looking at the query and seeing where (or if) the joined in table was used at all. DBIx::Class does the same thing, for databases that do not have such tooling built in. In fact there was a time when it could prune certain kinds of joins that even the lauded PostgreSQL could not. That may no longer be the case though.

The rest of what follows in this blog post is a very slightly tidied up commit message of the original branch. Enjoy!


Recently Craig Glendenning found a query in the ZR codebase that was using significant resources; the main problem was that it included a relationship but didn’t need to. We fixed the query, but I was confused because DBIx::Class has a built in join pruner and I expected it to have transparently solved this issue.

It turns out we found a new case where the join pruner can apply!

If you have a query that matches all of the following conditions:

  • a relationship is joined with a LEFT JOIN
  • that relationship is not in the WHERE
  • that relationship is not in the SELECT
  • the query is limited to one row

You can remove the matching relationship. The WHERE and SELECT conditions should be obvious: if a relationship is used in the WHERE clause, you need it to be joined for the WHERE clause to be able to match against the column. Similarly, for the SELECT clause the relationship must be included so that the column can actually be referenced in the SELECT clause.

The one row and LEFT JOIN conditions are more subtle; but basically consider this case:

You have a query with a limit of 2 and you join in a relationship that has zero or more related rows. If you get back zero rows for all of the relationships, the root table will basically be returned and you’ll just get the first two rows from that table. But consider if you got back two related rows for each row in the root table: you would only get back the first row from the root table.

Similarly, the reason that LEFT is specified is that if it were a standard INNER JOIN, the relationship will filter the root table based on relationship.

If you specify a single row, when a relationship is LEFT it is not filtering the root table, and the “exploding” nature of relationships does not apply, so you will always get the same row.


I’ve pushed the change that adds the new join prune to GitHub, and notified the current maintainer of DBIx::Class in the hopes that it can get merged in for everyone to enjoy.

Posted Fri, Apr 29, 2016

Python: Taking the Good with the Bad

For the past few months I’ve been working on a side project using Python. I’ll post about that project some other time, but now that I’ve used Python a little bit I think I can more reasonably consider it (so not just “meaningful whitespace?!?“)

It’s much too easy to write a bunch of stuff that is merely justification of the status quo (in my case that is the use of Perl.) I’m making an effort to consider all of the good things about Python and only mentioning Perl when there is a lack. I’d rather not compare them at all, but I don’t see a way around that without silly mental trickery.

Note that this is about Python 2. If you want to discuss Python 3, let’s compare it to Perl 6.

Generally awesome stuff about Python

The following are my main reasons for liking Python. They are in order of importance, and some have caveats.

Generators

Generators (also known as continuations) are an awesome linguistic feature. It took me a long time to understand why they are useful, but I think I can summarize it easily now:

What if you wanted to have a function with an infinite loop in the middle?

In Perl, the typical answer might be to build an interator. This is fine, but it can be a lot of work. In Python, you just use normal code, and a special keyword, yield. For simple stuff, the closures you have available to you in Perl will likely seem less magic. But for complicated things, like iterating over the nodes in a tree, Python will almost surely be easier.

Let me be clear: in my mind, generators are an incredibly important feature and that Perl lacks them is significant and terrible. There are efforts to get them into core, and there is a library that implements them, but it is not supported on the newest versions of Perl.

Builtins

Structured data is one of the most important parts of programming. Arrays are super important; I think that’s obvious. Hashes are, in my opinion, equally useful. There are a lot of other types of collections that could be considered after the point of diminishing returns once hashes are well within reach, but a few are included in Python and I think that’s a good thing. To clarify, in Python, one could write:

cats = set(['Dantes', 'Sunny Day', 'Wheelbarrow'])
tools = set(['Hammer', 'Screwdriver', 'Wheelbarrow'])

print cats.intersection(tools)

In Perl that can be done with a hash, but it’s a hassle, so I tend to use Set::Scalar.

Python also ships with an OrderedDict, which is like Perl’s Tie::IxHash. But Tie::IxHash is sorta aging and weird and what’s with that name?

A Python programmer might also mention that the DefaultDict is cool. I’d argue that the DefaultDict merely works around Python’s insistence that the programmer be explicit about a great many things. That is: it is a workaround for Pythonic dogma.

Rarely need a compiler for packages

In my experience, only very rarely do libraries need to be compiled in Python. So oviously math intensive stuff like crypto or high precision stuff will need a compiler, but the vast majority of other things do not. I think part of the reason for this is that Python ships with an FFI library (ctypes). So awesome.

In Perl, even the popular OO framework Moose requires a compiler!

“protocols”

If you want to define your own weird kind of dictionary in Python, it’s really easy: you subclass dict and define around ten methods. It will all just work. This applies to all of Python’s builtins, I believe.

In Perl, you have to use tie, which is similar but you can end up with oddities related to Perl’s weird indirect method syntax. Basically, often things like print $fhobject $str will not work as expected. Sad camel.

Interactive Python Shell

Python ships with an excellent interactive shell, which can be used by simply running python. It has line editing, history, builtin help, and lots of other handy tools for testing out little bits of code. I have lots of little tools to work around the lack of a good interactive shell in Perl. This is super handy.

Simple Syntax

The syntax of Python can be learned by a seasoned programmer in an afternoon. Awesome.

Cool, weird projects

I’ll happily accept more examples for this. A few spring to mind:

  1. BCC is sorta like a DTrace but for Linux.
  2. PyNES lets you run NES games written in Python.
  3. BITS is a Python based operating system, for doing weird hardware stuff without having to write C.

Batteries Included

Python ships with a lot of libraries, like the builtins above, that are not quite so generic. Some examples that I’ve used include a netrc parser, an IMAP client, some email parsing tools, and some stuff for building and working with iterators. The awesome thing is that I’ve written some fairly handy tools that in Perl would have certainly required me to reach for CPAN modules.

What’s not so awesome is that the libraries are clearly not of the high quality one would desire. Here are two examples:

First, the core netrc library can only select by host, instead of host and account. This was causing a bug for me when using OfflineIMAP. I rolled up my sleeves, cloned cpython, fixed the bug, and then found that it had been reported, with a patch, five years ago. Not cool.

Second, the builtin email libraries are pretty weak. To get the content of a header I had to use the following code:

import email.header
import re

decoded_header = str(email.header.make_header(email.header.decode_header(header)))
unfolded_header = re.sub('[\r\n]', '', decoded_header)

I’m not impressed.

There are more examples, but this should be sufficient.

Now before you jump on me as a Perl programmer: Perl definintely has some weak spots in it’s included libraries, but unlike with Python, the vast majority of those are actually on CPAN and can be updated without updating Perl. Unless I am missing something, that is not the case with the Python core libraries.

Prescriptive

The Python community as a whole, or at least my interaction with it, seems to be fairly intent on defining the one-and-true way to do anything. This is great for new programmers, but I find it condescending and unhelpful. I like to say that the following are all the programmer’s creed (stolen from various media):

That which compiles is true.

Nothing is True and Everything is Permissible

“Considered Harmful” Considered Harmful

Generally not awesome stuff about Python

As before, these are things that bother me about Python, in order.

Variable Scope and Declaration

Python seems to aim to be a boring but useful programming language. Like Java, but a scripting language. This is a laudable goal and I think Go is the newest in this tradition. Why would a language that intends to be boring have any scoping rules that are not strictly and exclusively lexical? If you know, tell me.

In Perl, the following code would not even compile:

use strict;

sub print_x { print("$x\n") }
print_x();
my $x = 1;
print_x();

In Python, it does what a crazy person would expect:

def foo():
   print(x)

foo()
x = 1
foo()

The real problem here is that in Python, variables are never declared. It is not an error to set x = 1 in Python, how else would you create the variable? In Perl, you can define a variable as lexical with my, global with our, and dynamic with local. Python is a sad mixture of lexical and global. The fact that anyone would ever need to explain scoping implies that it’s pretty weird.

PyPI and (the lack of) friends

I would argue that since the early 2000’s, a critical part of a language is its ecosystem. A language that has no libraries is lonely, dreary work. Python has plenty of libraries, but the actual web presence of the ecosystem is crazily fractured. Here are some things that both search.cpan.org and MetaCPAN do that PyPI does not:

  • Include and render all of the documentation for all modules (example)
  • Include a web accessible version of all (or almost all) releases of the code (example, example)

And MetaCPAN does a ton more; here are features I often use:

And there’s a constellation of other tools; here are my favorites:

  • CPANTesters aggregates the test results of individuals and smoke machines of huge amounts of CPAN on a ton of operating systems. Does your module run on Solaris?
  • rt.cpan.org is a powerful issue tracker that creates a queue of issues for every module released on CPAN. Nowadays with Github that’s not as important as it used to be, but even with Github, RT still allows you to create issues without needing to login.

Documentation

This is related to my first complaint about PyPI above. When I install software on my computer, I want to read the docs that are local to the installed version. There are two reasons for this:

  1. I don’t want to accidentally read docs for a different version than what is installed.
  2. I want to be able to read documentation when the internet is out.

Because the documentation of Python packages is so free form, people end up hosting their docs on random websites. That’s fine, I guess, but people end up not including the documentation in the installed module. For example, if you install boltons, you’ll note that while you can run pydoc boltons, there is no way to see this page via pydoc. Pretty frustrating.

On top of that, the documentation by convention is reStructuredText. rst is fine, as a format. It’s like markdown or POD (Perl’s documentation format) or whatever. But there are (at least) two very frustrating issues with it:

  1. There is no general linking format. In Perl, if I do L<DBIx::Class::Helpers>, it will link to the doc for that module. Because of the free form documentation in Python, this is impossible.
  2. It doesn’t render at all with pydoc; you just end up seeing all the noisy syntax.

And it gets worse! There is documentation for core Python that is stored on a wiki! A good example is the page about the time complexity of various builtins. There is no good reason for this documentation to not be bundled with the actual Python release.

matt’s script archive

As much as the prescriptivism of Python exists to encourage the community to write things in a similar style; a ton of old code still exists that is just as crappy as all the old Perl code out there.

I love examples, and I have a good one for this. My little Python project involves parsing RSS (and Atom) feeds. I asked around and was pointed at feedparser. It’s got a lot of shortcomings. The one that comes to mind is, if you want to parse feeds without sanitizing the included html, you have to mutate a global. Worse, this is only documented in a comment in the source code.

Unicode

Python has this frustrating behaviour when it comes to printing Unicode. Basically if the programmer is printing Unicode (the string is not bytes, but meaningful characters) to a console, Python assumes that it can encode as UTF8. If it’s printing to anything else it defaults to ASCII and will often throw an exception. This means you might have some code that works perfectly well when you are testing it Interactively, and when it happens to print just ASCII when redirected to a file, but when characters outside of ASCII show up it throws an exception. (Try it and see: python -c 'print(u"\u0420")' | cat) (Read more here.)

It’s also somewhat frustrating that the Python wiki complains that Python predates Unicode and thus cannot be expected to support it, while Perl predates even Python, but has excellent support for Unicode built into Perl 5 (the equivalent of Python 2.x.) A solid example that I can think of is that while Python encourages users to be aware of Unicode, it does not give users a way to compare strings ignoring case. Here’s an example of where that matters; if we are ignoring case, “ß” should be equal to “ss”. In Perl you can verify this by running: perl -Mutf8 -E'say "equal" if fc "ß" eq fc "ss"'. In Python one must download a package from PyPI which is documented as an order of magnitude slower than the core version from Python 3.

SIGPIPE

In Unix there is this signal, SIGPIPE, that gets sent to a process when the pipe it is writing to gets closed. This can be a simple efficiency improvement, but even ignoring efficiency, it will get used. Imagine you have code that reads from a database, then prints a line, then reads, etc. If you wanted the first 10 rows, you could pipe to head -n10 and both truncate after the 10th line and kill the program. In Python, this causes an exception to be thrown, so users of Python programs who know and love Unix will either be annoyed that they see a stack trace, or submit annoying patches to globally ignore SIGPIPE in your program.


Overall, I think Python is a pretty great language to have available. I still write Perl most of the time, but knowing Python has definitely been helpful. Another time I’ll write a post about being a polyglot in general.

Posted Thu, Apr 21, 2016

Humane Interfaces

In this post I just want to briefly discuss and demonstrate a humane user interface that I invented at work.

At ZipRecruiter, where I work, we use a third party system called Bonus.ly. Each employee is given $20 in the form of 100 Zip Points at the beginning of each month. These points can be given to any other employee for any reason, and then redeemed for gift cards basically anywhere (Amazon, Starbucks, Steam, REI, and even as cash with Paypal, just to name a few.)

Of course the vast majority of users give bonusly by using the web interface, where you pick a user with an autocompleter, you select the amount with a dropdown, and you type the reason and hashtag (you must include a hashtag) in a textfield. This is fine for most users, but I hate the browser because it’s so sluggish and bloated. The other option is to use the built in Slack interface. I used that for a long time; it works like this: /give +3 to @sara for Helping me with my UI #teamwork

This is pretty good but there is one major problem: the username above is based on the local part of an email address, even though when it comes to Slack using @foo looks a lot like a Slack username. I kept accidentally giving bonusly to the wrong Sara!

Bonusly has a pretty great API and one of my coworkers released an inteface on CPAN. I used this API to write a small CLI script. The actual script is not that important (but if you are interested let me know and I’ll happily publish it.) What’s cool is the interface. First off here is the argument parsing:

my ($amount, $user, $reason);

for (@ARGV) {
  if (m/^\d+$/) {
    $amount = $_;
  } elsif (!m/#/) {
    $user = $_;
  } else {
    $reason = $_;
  }
}

die "no user provided!\n"   unless $user;
die "no amount provided!\n" unless $amount;
die "no reason provided!\n" unless $reason;

The above parses an amount, a user, and a reason for the bonus. The amount must be a positive integer, and the reason must include a hashtag. Because of this, we can ignore the ordering. This solves an unstated annoyance with the Slack integration of Bonusly; I do not have to remember the ordering of the arguments, I just type what makes sense!

Next up, the user validation, which resolves the main problem:

# The following just makes an array of users like:
# Frew Schmidt <frew@ziprecruiter.com>

my @users =
  grep _render_user($_) =~ m/$user/i,
  @{_b->users->list->{result}};


if (@users > 1) {
  warn "Too many users found!\n";
  warn ' * ' . _render_user($_) . "\n" for @users;
  exit 1;
} elsif (@users < 1) {
  warn "No users match! Could $user be a contractor?\n";
  exit 2;

The above will keep from accidentally selecting one of many users by prompting the person running the script for a more specific match.

Of course the above UI is not perfect for every user. But I am still very pleased to have unordered positional arguments. I hope this inspires you to reduce requirements on your users when they are using your software.

Posted Sat, Apr 9, 2016

CloudFront Migration Update

When I migrated my blog to CloudFront I mentioned that I’d post about how it is going in late March. Well it’s late March now so here goes!

First off, I switched from using the awscli tools and am using s3cmd because it does the smart thing and only syncs if the md5 checksum is different. Not only does this make a sync significantly faster, it also reduces PUTs which are a major part of the cost of this endeavour.

Speaking of costs, how much is this costing me? February, which was a partial month, cost a total of $0.03. One might expect March to cost more than four times that amount (still couch change) but because of the s3cmd change I made, the total cost in March so far is $0.04, with a forecast of $0.05. There is one cost that I failed to factor in: logging.

While my full blog is a svelte 36M, just the logs for CloudFront over the past 36 days has been almost double that; and they are compressed with gzip! The logging incurs additional PUTs to S3 as well as an additional storage burden. The free tier includes 5G of free storage, but pulling down the log files as structured (a file per region per hour gzipped) is a big hassle. I had over five thousand log files to download, and it took about an hour. I’m not sure how I’ll deal with it in the future but I may periodically pull down those logs, consolidate them, and replace them with a rolled up month at a time file.

Because the logs were slightly easier to interact with than before I figured I’d pull them down and take a look. I had to write a little Perl script to parse and merge the logs. Here’s that, for the interested:

#!/usr/bin/env perl

use 5.20.0;
use warnings;

use autodie;

use Text::CSV;

my $glob = shift;
my @values = @ARGV;
my @filelisting = glob($glob);

for my $filename (@filelisting) {
  open my $fh, '<:gzip', $filename;
  my $csv = Text::CSV->new({ sep_char => "\t" });
  $csv->column_names([qw(
      date time x_edge_location sc_bytes c_ip method host cs_uri_stem sc_status
      referer user_agent uri_query cookie x_edge_result_type x_edge_request_id
      x_host_header cs_protocol cs_bytes time_taken x_forwarded_for ssl_protocol
      ssl_cipher x_edge_response_result_type
  )]);
  # skip headers
  $csv->getline($fh) for 1..2;
  while (my $row = $csv->getline_hr($fh)) {
    say join "\t", map $row->{$_}, @values
  }
}

To get all of the accessed URLs, with counts, I ran the following oneliner:

perl read.pl '*.2016-03-*.gz' cs_uri_stem | sort | uniq -c | sort -n

There are some really odd requests here, along with some sorta frustrating issues. Here are the top thirty, with counts:

  27050 /feed
  24353 /wp-content/uploads/2007/08/transform.png
  13723 /feed/
   8044 /static/img/me200.gif
   5011 /index.xml
   4607 /favicon.ico
   3866 /
   2491 /static/css/styles.css
   2476 /static/css/bootstrap.min.css
   2473 /static/css/fonts.css
   2389 /static/js/bootstrap.min.js
   2384 /static/js/jquery.js
   2373 /robots.txt
    966 /posts/install-and-configure-the-ms-odbc-driver-on-debian/
    637 /wp-content//uploads//2007//08//transform.png
    476 /archives/1352
    311 /wp-content/uploads/2007/08/readingminds2.png
    278 /keybase.txt
    266 /posts/replacing-your-cyanogenmod-kernel-for-fun-and-profit/
    225 /archives/1352/
    197 /feed/atom/
    191 /static/img/pong.p8.png
    166 /posts/concurrency-and-async-in-perl/
    155 /n/a
    149 /posts/weirdest-interview-so-far/
    144 /apple-touch-icon.png
    140 /apple-touch-icon-precomposed.png
    133 /posts/dbi-logging-and-profiling/
    126 /posts/a-gentle-tls-intro-for-perlers/
    120 /feed/atom

What follows is pretty intense navel gazing that I suspect very few people care about. I think it’s interesting but that’s because like most people I am somewhat of a narcissist. Feel free to skip it.

So /feed, /feed/, /feed/atom, and /feed/atom/ are in this list a lot, and sadly when I migrated to CloudFront I failed to set up the redirect header. I’ll be figuring that out soon if possible.

/, /favicon.ico, and /index.xml are all normal and expected. It really surprises me how many things are accessing / directly. A bunch of it is people, but a lot is feed readers. Why they would hit / is beyond me.

/wp-content/uploads/2007/08/transform.png and /wp-content//uploads//2007//08//transform.png (from this page) seems to be legitimately popular. It is bizarrely being accessed from a huge variety of User Agents. At the advice of a friend I looked more closely and it turns out it’s being hotlinked by a Vietnamese social media site or something. This is cheap enough that I don’t care enough to do anything about it.

/wp-content/uploads/2007/08/readingminds2.png is similar to the above.

/static/img/me200.gif is an avatar that I use on a few sites. Not super surprising, but as always: astounded at the number.

/robots.txt Is being accessed a lot, presumably by all the various feed readers. It might be worthwhile to actually create that file. No clue.

/static/css/* and /static/js/* should be pretty obvious. I would consider using those from a CDN but my blog is already on a CDN so what’s the point! But it might be worth at least adding some headers so those are cached by browsers more aggressively.

/posts/install-and-configure-the-ms-odbc-driver-on-debian/ (link) is apparently my most popular post, and I would argue that that is legitimate. I should automate some kind of verification that it continues to work. I try to keep it updated but it’s hard now that I’ve stopped using SQL Server myself.

/archives/1352 and /archives/1352/ is pre-hugo URL URL for the announcement of DBIx::Class::DeploymentHandler. I’m not sure why the old URL is being linked to, but I am glad I put all that effort into ensuring that old links keep working.

/keybase.txt is the identity proof for Keybase (which I have never used by the way.) It must check every four hours or something.

/posts/replacing-your-cyanogenmod-kernel-for-fun-and-profit/ (link) is a weird post of mine, but I’m glad that a lot of people are interested, because it was a lot of work to do.

/static/img/pong.p8.png, /posts/weirdest-interview-so-far/ (link), and /posts/dbi-logging-and-profiling/ (link) were all on / at some point in the month so surely people just clicked those from there.

/posts/concurrency-and-async-in-perl/ (link) and /posts/a-gentle-tls-intro-for-perlers/ (link) are more typical posts of mine, but are apparently pretty popular and I would say for good reason.

/n/a, /apple-touch-icon.png, /apple-touch-icon-precomposed.png all seem like some weird user agent thing, like maybe iOS checks for that if someone makes a bookmark?

World Wide Readership

Ignoring the seriously hotlinked image above, I can easily see where most of my blog is accessed:

perl read.pl '*.2016-03-*.gz' cs_uri_stem x_edge_location  | \
  grep -v 'transform' | cut -f 2 | perl -p -e 's/[0-9]+//' | \
  sort | uniq -c | sort -n

Here’s the top 15 locations which serve my blog:

  21330 JFK # New York
   9668 IAD # Washington D.C.
   8845 ORD # Chicago
   7098 LHR # London
   6536 FRA # Frankfurt
   5319 DFW # Dallas
   4568 ATL # Atlanta
   4328 SEA # Seattle
   3345 SFO # San Fransisco
   3137 CDG # Paris
   2991 AMS # Amsterdam
   2966 EWR # Newark
   2339 LAX # Los Angeles
   1993 ARN # Stockholm
   1789 WAW # Warsaw

I’m super pleased at this, because before the migration to CloudFront all of this would be served from a single server in DFW. It was almost surely enough but it’d be slower, especially for the stuff outside of the states.


Aside from the fact that I have not yet set up the redirect for the old feed URLs, I think the migration to CloudFront has gone very well. I’m pleased that I’m less worried about rebooting my Linode and that my blog is served quickly, cheaply, and efficiently to readers worldwide.

Posted Sat, Mar 26, 2016

DBI Logging and Profiling

If you use Perl and connect to traditional relational databases, you use DBI. Most of the Perl shops I know of nowadays use DBIx::Class to interact with a database. This blog post is how I “downported” some of my DBIx::Class ideas to DBI. Before I say much more I have to thank my boss Bill Hamlin, for showing me how to do this.

Ok so when debugging queries, with DBIx::Class you can set the DBIC_TRACE environment variable and see the queries that the storage layer is running. Sadly sometimes the queries end up mangled, but that is the price you pay for pretty printing.

You can actually get almost the same thing with DBI directly by setting DBI_TRACE to SQL. That is technically not supported everywhere, but it has worked everywhere I’ve tried it. If I recall correctly though, unlike with DBIC_TRACE, using DBI_TRACE=SQL will not include any bind arguments.

Those two features are great for ad hoc debugging, but at some point in the lifetime of an application you want to count the queries executed during some workflow. The obvious example is during the lifetime of a request. One could use DBIx::Class::QueryLog or something like it, but that will miss queries that were executed directly through DBI, and it’s also a relatively expensive way to just count queries.

The way to count queries efficiently involves using DBI::Profile, which is very old school, like a lot of DBI. Here’s how I got it to work just recording counts:

#!/usr/bin/env perl

use 5.12.0;
use warnings;

use Devel::Dwarn;
use DBI;
use DBI::Profile;
$DBI::Profile::ON_DESTROY_DUMP = sub{};

my $dbi_profile = DBI::Profile->new(
  Path => [sub { $_[1] eq 'execute' ? ('query') : (\undef) }]
);

$DBI::shared_profile = $dbi_profile;

my $dbh = DBI->connect('dbi:SQLite::memory:');
my $sth = $dbh->prepare('SELECT 1');
$sth->execute;
$sth->execute;
$sth->execute;

$sth = $dbh->prepare('SELECT 2');
$sth->execute;
$sth->execute;
$sth->execute;

my @data = $dbi_profile->as_node_path_list;
Dwarn \@data;

And in the above case the output is:

[
  [
    [
      6,
      "6.67572021484375e-06",
      "2.86102294921875e-06",
      0,
      "2.86102294921875e-06",
      "1458836436.12444",
      "1458836436.12448"
    ],
    "query"
  ]
]

The outermost arrayref is supposed to contain all of the profiled queries, so each arrayref inside of that is a query, with it’s profile data as the first value (another arrayref) inside, and all of the values after that first arrayref are user configurable.

So the above means that we ran six queries. There are some numbers about durations but they are so small that I won’t consider them carefully here. See the link above for more information. Normally if you had used DBI::Profile you would see two distinct queries, with a set of profiling data for each, but here we see them all merged into a single bucket. All of the magic for that is in my Path code references.

Let’s dissect it carefully:

$_[1] eq 'execute' # 1
  ? ('query')      # 2
  : (\undef)       # 3

Line 1 checks the DBI method being used. This is how we avoid hugely inflated numbers. We are trading off some granularity here for a more comprehensible number. See, if you prepare 1000 queries, you are still doing 1000 roundtrips to the database, typically. But that’s a weird thing, and telling a developer how many “queries they did” is easier to understand when that means simply executing the query.

In line 2 we return ('query'). This is what causes all queries to be treated as if they were the same. We could have returned any constant string here. If we wanted to do something weird, like count based on type of query, we could do something clever like the following:

return (\undef) if $_[1] eq 'execute';
local $_ = $_;

s/^\s*(\w+)\s+.*$/$1/;
return ($_);

That would create a bucket for SELECT, UPDATE, etc.

Ok back to dissection; line 3 returns (\undef), which is weird, but it’s how you signal that you do not want to include a given sample.


So the above is how you generate all of the profiling information. You can be more clever and include caller data or even bind parameters, though I’ll leave those as a post for another time. Additionally, you could carefully record your data and then do some kind of formatting at read time. Unlike DBIC_TRACE where you can end up with invalid SQL, you could use this with post-processing to show a formatted query if and only if it round trips.

Now go forth; record some performance information and ensure your app is fast!

Posted Thu, Mar 24, 2016

How to Enable ptrace in Docker 1.10

This is just a quick blog post about something I got working this morning. Docker currently adds some security to running containers by wrapping the containers in both AppArmor (or presumably SELinux on RedHat systems) and seccomp eBPF based syscall filters. This is awesome and turning either or both off is not recommended. Security is a good thing and learning to live with it will make you have a better time.

Normally ptrace, is disabled by the default seccomp profile. ptrace is used by the incredibly handy strace. If I can’t strace, I get the feeling that the walls are closing in, so I needed it back.

One option is to disable seccomp filtering entirely, but that’s less secure than just enabling ptrace. Here’s how I enabled ptrace but left the rest as is:

A handy perl script

#!/usr/bin/perl

use strict;
use warnings;

# for more info check out https://docs.docker.com/engine/security/seccomp/

# This script simply helps to mutate the default docker seccomp profile.  Run it
# like this:
#
#     curl https://raw.githubusercontent.com/docker/docker/master/profiles/seccomp/default.json | \
#           build-seccomp > myapp.json

use JSON;

my $in = decode_json(do { local $/; <STDIN> });
push @{$in->{syscalls}}, +{
  name => 'ptrace',
  action => 'SCMP_ACT_ALLOW',
  args => []
} unless grep $_->{name} eq 'ptrace', @{$in->{syscalls}};

print encode_json($in);

In action

So without the custom profile you can see ptrace not working here:

$ docker run alpine sh -c 'apk add -U strace && strace ls'
fetch http://dl-4.alpinelinux.org/alpine/v3.2/main/x86_64/APKINDEX.tar.gz
(1/1) Installing strace (4.9-r1)
Executing busybox-1.23.2-r0.trigger
OK: 6 MiB in 16 packages
strace: test_ptrace_setoptions_for_all: PTRACE_TRACEME doesn't work: Operation not permitted
strace: test_ptrace_setoptions_for_all: unexpected exit status 1

And then here is using the profile we generated above:

$ docker run --security-opt "seccomp:./myapp.json" alpine sh -c 'apk add -U strace && strace ls'
2016/03/18 17:08:53 Error resolving syscall name copy_file_range: could not resolve name to syscall - ignoring syscall.
2016/03/18 17:08:53 Error resolving syscall name mlock2: could not resolve name to syscall - ignoring syscall.
fetch http://dl-4.alpinelinux.org/alpine/v3.2/main/x86_64/APKINDEX.tar.gz
(1/1) Installing strace (4.9-r1)
Executing busybox-1.23.2-r0.trigger
OK: 6 MiB in 16 packages
execve(0x7ffe02456c88, [0x7ffe02457f30], [/* 0 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x7f0df919c048) = 0
set_tid_address(0x7f0df919c080)         = 16
mprotect(0x7f0df919a000, 4096, PROT_READ) = 0
mprotect(0x5564bb1e7000, 16384, PROT_READ) = 0
getuid()                                = 0
ioctl(0, TIOCGWINSZ, 0x7ffea2895340)    = -1 ENOTTY (Not a tty)
ioctl(1, TIOCGWINSZ, 0x7ffea2895370)    = -1 ENOTTY (Not a tty)
ioctl(1, TIOCGWINSZ, 0x7ffea2895370)    = -1 ENOTTY (Not a tty)
stat(0x5564bafdde27, {...})             = 0
open(0x5564bafdde27, O_RDONLY|O_DIRECTORY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents64(3, 0x5564bb1ec040, 2048)     = 512
lstat(0x5564bb1ec860, {...})            = 0
lstat(0x5564bb1ec900, {...})            = 0
lstat(0x5564bb1ec9a0, {...})            = 0
lstat(0x5564bb1eca40, {...})            = 0
lstat(0x5564bb1ecae0, {...})            = 0
lstat(0x5564bb1ecb80, {...})            = 0
lstat(0x5564bb1ecc20, {...})            = 0
lstat(0x5564bb1eccc0, {...})            = 0
lstat(0x5564bb1ecd60, {...})            = 0
lstat(0x5564bb1ece00, {...})            = 0
lstat(0x5564bb1ecea0, {...})            = 0
lstat(0x5564bb1ecf40, {...})            = 0
lstat(0x5564bb1ecfe0, {...})            = 0
lstat(0x7f0df919e6e0, {...})            = 0
lstat(0x7f0df919e780, {...})            = 0
bin
dev
etc
home
lib
linuxrc
media
mnt
proc
root
run
sbin
sys
tmp
usr
var
lstat(0x7f0df919e820, {...})            = 0
getdents64(3, 0x5564bb1ec040, 2048)     = 0
close(3)                                = 0
ioctl(1, TIOCGWINSZ, 0x7ffea2895278)    = -1 ENOTTY (Not a tty)
writev(1, [?] 0x7ffea2895210, 2)        = 4
writev(1, [?] 0x7ffea2895330, 2)        = 70
exit_group(0)                           = ?
+++ exited with 0 +++

A final warning

The above is not too frustrating and is more secure than disabling seccomp entirely, but enabling ptrace as a general course of action is likely to be wrong. I am doing this because it helps with debugging stuff inside of my container, but realize that for long running processes you can always strace processes that are running in the container from the host.

Posted Fri, Mar 18, 2016

When I Planned on Moving to Australia

Many of you do not know that I was born on the Gulf Coast of Mississippi. I lived there, with a brief intermission in Oconomowoc, Wisconsin, until I moved to Texas to go to college.

That first year of school is rife with good memories; but there was a dark spot. Specifcally, Hurricane Katrina.

Katrina was a big deal. To this day there are houses that are just gone, with nothing but a slab and a lot of weeds in their place.

You people who do not live in a place where hurricanes are common (two or more per year) likely think of this like you might think of someone who left their house unlocked or something. The problem is, as I’ve already given away, hurricanes are not rare on the Gulf Coast. They happen multiple times every year and the vast majority are non-events.

You judge people for not evacuating, but that’s because you can do that safely from your living room. Nearly every summer in Mississippi I experienced at least one power outage lasting days due to a hurricane. The worst flooding ever did was damage the carpet in my awesome detached bedroom.


Katrina started like all huricanes do: on the news. Doom was predicted. Most hurricanes would hit the barrier islands and suddenly slow down, and the twenty-four hour news would stop focussing on southern Mississippi and go back to the murders and rapes the viewers so eagerly lap up.

I vividly remember watching the news and seeing Katrina heading directly for The Coast, confidently assuming that it would veer east into the sore thumb of Florida.

Then it didn’t.


After Katrina landed, cell phones stopped working. My cell, even in TX, was not reachable for weeks, even by other people in TX. I couldn’t reach my family via cell or landline. I was positive that my entire nuclear family was wiped out. In a daze I expected to run away to Australia, selling what little I owned, with the hopes of buying a motorcycle after I arrived.

Hurricane Sandy has been much more widely discussed, presumably because she was more recent, and more weird (ending up where she did.) Take note of the statistics that wikipedia helpfully includes at the top right of the article: more than a thousand people died because of Katrina. You can’t even imagine that many human beings. That’s a lot.

After a week or two I finally was able to determine that my family survived. Many people look down on Facebook, Twitter, and other social media because it didn’t exist when they were kids or some stupid thing like that. If it had not been for LiveJournal I would not have found out that my family was still alive.

Here’s a quote from August 31, 2005 from my blog at the time:

So [ … ] a few other discouraging things happened. I won’t go into details; it would take too long. Just know that it was a bummer and didn’t feel great. And then this hurricane comes blowing through everywhere and I have [wtf sic?] if my family is even alive. I am nearly positive that my house is gone, but that’s really no big deal. Anyways. What a bummer eh? Don’t pity me. Pray for my family.

Note: If you have seen The Life Aquatic, I feel like Steve Zissou did when his ship got pirated and everything turned red and he shot everyone.

I can’t find the post, because I think it was on someone elses page, but I found out that my family was ok when my friend Kate Mendoza (née Jarvis) commented on some post that she knew my family had survived.


If there are lessons to learn from this story, I think they are:

  1. You could lose it all pretty quickly; enjoy your loved ones while you can.
  2. Stop looking at a finger when someone is pointing at the moon; social media is people.
Posted Sat, Mar 12, 2016

Weirdest Interview So Far

This is a pretty good story and I want you all to hear it. When I was graduating from college I interviewed with three companies. Two of them (MTSI and Rockwell Collins) offered me jobs. The other one, Empire Systems Inc., did not.

So Empire Systems Inc. was founded by a couple LeTourneau (my alma mater) alumni. I didn’t know the two of them myself, but we did overlap, by a year or two. Anyway so back to the interviews.

They came to school for the career fair and gave their spiel, which changed a couple times, but what I recall was something like the (non-existant at the time) iPad, called the Marquess. They needed EE’s and programmers and whatnot. It sounded kinda cool and I really wanted to work there, so I submitted my (probably worthless) resume. I got an interview; I suspect everyone who submitted a resume did, but who knows.

Interview I

So they invite me into some room; I think it was a class, but it could have been some kind of break room. The important thing to know is that it had a projector. We introduced ourselves, sat down, and get started.

The first question was likely something boring, like, “How would you handle conflict on the job?” I don’t remember. What I do remember is that the next question was: “There was a dot moving around on the projector while you answered; which quadrant was it primarily in?”

The rest of the interview went like that, where they’d ask a normal question, and then some oddball thing. The other oddball questions I remember were:

  • “We’re going to show faces of famous people; identify them.”
  • “Identify the following animals as predator or prey.”

That last question, with the animals, was especially funny, because I watched the first two (tiger and shark?) and said: “predator” and then didn’t say anymore, assuming that they all were. I was wrong but didn’t correct myself.

Interview II

I remember waiting a long time to hear back. I followed up a couple times before getting the coveted on site interview. But finally they raised the capital they needed to hire more engineers. They had me, another peer, and some others out. I remember vividly that for some reason, presumably lack of planning, they brought us via train instead of plane, even though it was not any cheaper. (As an aside, that train ride was when AJAX finally clicked for me, and I think I leveled up as an engineer, I guess to level 1?)

I remember two things about that situation. The first was that the interview, again, was crazy. It first featured a test. No; not a quiz where you have to implement sorting algorithms or whatever. This quiz asked details about the history of France and various parts of the Middle East. It was huge too, like, maybe twenty or thirty pages. Then after that they did a group interview. And by that I mean: one interviewer to many interviewees. So they’d ask a question and we’d all have to vie for their attention or whatever.

I recall them asking something like, “if you could invent anything, what would it be?” I think that’s a good question and it has made me think a lot ever since. Aside from that the whole thing was a joke.

The other thing I recall, with fondness, was their showing off of how it was a lifestyle job. They talked about doing karate before work and wearing ties every day and all kinds of stupid stuff like that, but it didn’t matter because it was all vapor (keep reading.) The good memory was that they had us over for a game night with some of the people who worked at Empire and their spouses and friends. That was a lot of fun. It felt really homey.

Backstory and Embezzlement

One of my friends at MTSI actually did a project with the founder of Empire in college. The friend in question told me that said founder tended to be very slippery when it came to his part of the project and had some bogus excuses relating to his other classes. Specifically he “charmed the profs with just mountains of documentation and presentation skills.”

Well, turns out he was slippery in real life too. About a year after I joined at MTSI someone sent a link about how the founder effectively stole half a million bucks. The whole thing sorta reminds me of this.

I hope you enjoyed this little walk down memory lane; I know I did!

Posted Sun, Mar 6, 2016

Migrating My Blog from Linode to CloudFront

Motivations

I have just completed the process of migrating my blog to CloudFront. There are a few reasons for this. Initially I had planned to migrate everything on my Linode to OVH, which has DDoS mitigation and I think even uptime SLAs. The reasoning behind that was the Linode kept getting DDoS’ed and I was sick of it.

Additionally, in January I went to SCALE14x and Eric Hammond (who was introduced to me by Andrew Grangaard) pointed out that by using the current generation of AWS tooling (Lambda, DynamoDB, etc) you can reduce total cost to less than the minimum pricing on a Linode. The cost of my Linode isn’t super expensive (less than the price of Netflix) but every little bit helps. On top of that we use the AWS stuff at work so another chance to be familiar with AWS is a good thing.

Finally, after the most recent security fiasco I just feel safer using infrastructure that is more well tested in general. Plus I think I can get away with moving most of my stuff off of VMs, which means I’m less likely to screw something up.

As a side note, I have been self hosting my blog since 2007. I am loathe to do external hosting, as external hosts all seem to end up dying at some point anyway. I did briefly consider hosting on github, but you either have to change your domain name (frioux.github.io) or have no TLS (more on that later) so I decided to go the manual hosting route.

Howto

For small stuff like this, it can be worthwhile to make a distinct AWS account for each project. I made a special blog account to help me with accounting if the total cost of this ends up being more than I expect. Because I have my own domain I have as many email addresses as I want, so I just made a new one specifically for my blog, and then used it to make a new AWS account.

After creating the blog account I enabled Cost Explorer. I have no idea why this has to be turned on, because it’s super helpful to be able to use. Next I Activated MFA (you know, for security!) Maybe I should have done that first. I could do something with IAM I’m sure but it would be overkill for something as single task as this that only I will ever use.

I followed instructions I found here to set up the S3 and CloudFront parts. The only issue I ran into was that I forgot to set the CNAME both in DNS and in the CloudFront config. To actually sync my blog I use the following command:

aws s3 sync --delete . s3://blog.afoolishmanifesto.com

The --delete flag is so that files that aren’t in the remote side get removed.

At this point you should be able to test that everything is mostly working by visiting the endpoint that the bucket provides. The CloudFront part usually takes a while because it has to sync all over the world and wait for DNS too.

Because I care about my readers I only serve my blog over HTTPS. It’s not that I think you are reading my blog in secret; I don’t want malware to be injected by messed up access points. Because of that I had to get a certificate. If I were serving from US East I could have gotten free, auto-renewing certificates from Amazon. Sadly I didn’t think to do this, even though it would have been trivial since I don’t really care where the site is served from. StartSSL also gives free certificates, so that’s what I used. To upload your certificate you need to use a command like this:

aws iam upload-server-certificate \
      --server-certificate-name blog_cert \
      --certificate-body file://blog.afoolishmanifesto.com/ApacheServer/2_blog.afoolishmanifesto.com.crt \
      --private-key file://blog.afoolishmanifesto.com.priv \
      --certificate-chain file://pwd/blog.afoolishmanifesto.com/ApacheServer/1_root_bundle.crt \
      --path /cloudfront/blog/

Getting and creating the certificate is not something I’m super interested in writing about, as it’s pretty well documented already.

Benefits

Clearly the fact that I pulled the trigger on this project means that I think it was worth it, so here are some of the benefits to using CloudFront to host my blog.

Pricing

After reading the nightmare glacier post last month I commited to reading and understanding the pricing models of the various AWS services before using them. With that in mind I read about the pricing of the stuff I’ll be using for my blog before embarking on this project.

The S3 Pricing is pretty understandable. I’ll pay 3¢/mo for the storage, as my blog is about 35 mB of HTML and images total. Uploading the entire blog afresh (which I sorta assume is what sync does, but I’m not sure) is about 16k files, which (rounding up) is 2¢. So if sync works inefficiently a post is likely to cost me about 5¢, including fixing typos or whatever. Assuming a lot of posts, let’s say sixteen a month, that adds up to 80¢ per month. There is no charge to transfer from S3 to CloudFront, so that adds up to a maximum of 83¢ per month for S3.

The CloudFront Pricing is even more simple. Assuming 100% of the traffic from my Linode is my blog (it is without a doubt mostly IRC, but for accounting purposes let’s assume the worst) and it is all from India (again, nope) the charge from CloudFront will be 51¢. Assuming every single request on my server is to my blog (another verifiable falsehood) that would add whopping $1.10 to the monthly bill. That adds up to $1.61 per month for CloudFront.

So, worst case scenario, my monthly bill is $2.44 a month. I suspect it will likely be much less than that. I’ll try to remember at the end of March to update this post with what the real price ends up being.

Global

Unlike my Linode, which always resided in the wonderful city of Dallas, TX, CloudFront specifically exists to be global. So if you read my blog from the UK (I’m sure there are some!) or Japan (eh… maybe not) it should be a lot more snappy now.

Isolation

Sometimes my Linode gets rebooted for Hypervisor updates; or worse I mess up my Apache config or something. The above setup is well isolated from all my other stuff so it should be very reliable.

Drawbacks

But it’s not all unicorns, rainbows, penny-whistles, and blow. There are some problems!

Pricing

The above calculations are based on past history. If I get DDoS’d directly I will suddenly get a bill for a thousand bucks, instead of my server just falling over. That’s something that gives me serious pause. My boss told me that you can use Lambda as rate limiting tool. I expect to look into that before too long, especially because I have other plans for Lambda anyway.

Slow to Update

Unsurprisingly, because CloudFront is a CDN, there is a TTL on the cached data, so sometimes it can take a few minutes for a modification to the blog to go live. Not a huge deal, but good to know anyway.


Overall this has been a relatively painless process and I think it is worth it. I hope this helps anyone considering migration to AWS.

Posted Sat, Feb 20, 2016

UCSPI

While CGI is a fairly well established, if aging, protocol, UCSPI seems fairly obscure. I suspect that UCSPI may see a resurgence as finally with systemd projects will have a reason to support running in such a mode. But here I go, burying the lede.

CGI Refresher

Just as a way of illustrating by example, I think that I should explain (hopefully only by way of reminder) how CGI works. Basically a server (usually Apache, IIS, or lately, nginx) waits for a client to connect, and when it does, it parses the request and all of the request headers. They look something like this:

POST /upload?test=false HTTP/1.1
User-Agent: blog/0.0.1
Content-Type: text/plain
Content-Length: 4

frew

And then various parts of the above go into environment variables; for example test=false would become the value of QUERY_STRING. Then the body (in this example, frew) would be written to the standard input of the CGI script. While this seems a little fiddly compared to some of the more modern APIs and frameworks, it is nice because you don’t even need a language that supports sockets. You can even write a simple script with a shell!

ugh

The response is almost just whatever the script prints to standard out, though perversely there is a small bit of modification that happens, so the server has to parse some of the output, which seems like a huge oversight in the specification. Specifically, instead of allowing the script to print:

HTTP/1.1 200 OK
Content-Type: text/html
...

It instead must write:

Status: 200 OK
Content-Type: text/html
...

and is even allowed to write:

Content-Type: text/html
Status: 200 OK
...

but the server still must translate that to:

HTTP/1.1 200 OK
Content-Type: text/html
...

This means that if the server works correctly it may need to buffer an unbound (by the spec) amount of headers before it gets to the Status header. Ah the joys of implementing a CGI server.

What is UCSPI

UCSPI stands for Unix Client Server Program Interface. Basically the way it works is that you have a tool that opens a socket and waits for new connections. When it gets a new connection it spins up a new process, setting up pipes between standard input and standard output of that process and to the input and output of the socket.

Here’s an interesting thing that I have needed to do that I could not do without UCSPI. Because each connection in the UCSPI model ends up being a separate set of processes, the connection can restart the parent UCSPI worker and still finish it’s connection.

This means that, for example, I can have a push to github automatically update my running server, without any weird side mechanisms like a second updater service or worse, a cronjob. I just do a git fetch and git reset --hard @{u}, and the next time a client connects it will be running the new code.

Here is how I did that. At some point I expect to make the automatic updater more reliable and generic.

Another sorta nice thing, though this is very much a tradeoff, is that the process that has the listening port is very small (2M on my machine) compared to, say, an actual Plack server (which is an order of magnitude bigger.) On the other hand, if your actual cgi script has a lot of dependencies it can take a long time to start, so this may not be a good long term solution.

Note that there are problems of course. Aside from the increased cost of spinning up a new server, you also have to be careful to avoid printing to standard out. If you do you are almost ensured to print your whatever before any headers which ends up being an invalid response. On the other hand you can do a bunch of weird old school type stuff like chdiring in a script and not worry about global state changes.

Aside: Plack under CGI

Because my web apps thus far have been implemented using PSGI, an abstraction of HTTP, they can run under their own servers or under CGI directly. I only really needed to do one of two things to make my application run under CGI:

I hope you found this interesting!

Posted Wed, Feb 10, 2016

Rust

I’ve really enjoyed writing Rust, lately. I posted yesterday about what I’m doing with it. In the meantime here are some immediate reactions to writing Rust:

Documentation

The documentation is pretty good. It could be better, like if every single method had an example included, but it could be a lot worse. And the fact that a lot (though not all for some reason) of the documentation has links to the related source is really handy.

Language Itself

The languages feels good. This is really hard to express, but the main thing is that type inference makes a lot of the type defintions feel less burdensome than, for example, Java and friends. It also feels stratospherically high level, with closures, object orientation, destructuring, handy methods on basic types like strings, and much more. Yet it’s actually pretty low level.

Community

The community is awesome! I have never had as many friendly and willing people help me as a total noob before. Maybe it’s because Rust has a code of conduct or maybe it’s because Mozilla are nice people. I appreciate that there are people who actually know what is up answering questions at all hours of the night; they also generally assume competence. While assuming competence may make the total amount of questions asked greater, it makes the entire exchange much mroe pleasant. More of this please!

Error Messages

The error messages are very good. For example, check this out:

$ rustc httpd.rs
httpd.rs:84:42: 84:43 error: unresolved name `n`. Did you mean `v`? [E0425]
httpd.rs:84             Ok(v) => { *content_length = n },
                                                     ^
httpd.rs:84:42: 84:43 help: run `rustc --explain E0425` to see a detailed explanation
error: aborting due to previous error

They all give some context like that, and then have an error code (the --explain thing) that lets you get a more complete description of what you did and how you can fix it. Sometimes the errors can be pretty inscrutable for a new user though:

$ rustc httpd.rs
httpd.rs:216:31: 219:7 error: the trait `core::ops::FnOnce<()>` is not implemented for the type `()` [E0277]
httpd.rs:216     let mut c_stdin = f.stdin.unwrap_or_else({
httpd.rs:217         warn!("Failed to get child's STDIN");
httpd.rs:218         early_exit("500 Internal Server Error");
httpd.rs:219     });
httpd.rs:216:31: 219:7 help: run `rustc --explain E0277` to see a detailed explanation
error: aborting due to previous error

Searchability

Searching for examples of stuff online is surprisingly hard. I don’t know if that’s because Rust is a popular video game or if it’s just because the language is fairly new. I hope to help remedy this in general.

Etc

There is certainly more, like the included package management system or other interesting language features. I may post more about those later, but the above is stuff that I ran into during my week long foray into Rust. Hope this helps!

Posted Tue, Feb 9, 2016

Announcing cgid

This post is an announcement of cgid.

Over the past week I developed a small UCSPI based single-file CGI server. The usage is very simple, due to the nature of the tool. Here’s a quick example of how I use it:

#!/bin/nosh
tcp-socket-listen 127.0.0.1 6000
tcp-socket-accept --no-delay
cgid
www/cgi-bin/my-cgi-script

If you don’t know anything about UCSPI, this will look like nonsense to you. I have a post that I’ll publish later this week about UCSPI, so you can wait for that, or you can search for it and find lots of documents about it already.


Rust

As a side note, cgid was written in Rust. I have a post about Rust itself in the queue, but I think discussing the “release process” of a binary tool like cgid at release time is sensible. The procedure for releasing went something like this:

git tag v0.1.0 -m 'Release v0.1.0'

# release to crates.io
cargo package
cargo publish

cargo build --release
# fiddle with github webpage to put binaries on the release

This is a joke compared to the spoiling I’ve had from Dist::Zilla, which is what I use when releasing packages to CPAN. At some point I’d like to automate Rust releases as much as Rik has automated releasing to CPAN.

I’ll keep my eye out for more things that deserve to be written in Rust, as I enjoyed the process, but I expect that ideas which deserve to be written in Rust are few and far between, for me. It is pretty cool that basically not knowing Rust, I successfully implemented a tool that doesn’t exist anywhere in less than two weeks.

Posted Mon, Feb 8, 2016

Handy Rust Macros

I’ve been writing some Rust lately and have been surprised at the dearth of examples that show up when I search for what seems obvious. Anyway, I wrote a couple macros that I’ve found very handy. The first seems like it should almost be core:

macro_rules! warn {
    ($fmt:expr) => ((writeln!(io::stderr(), $fmt)).unwrap());
    ($fmt:expr, $($arg:tt)*) => ((writeln!(io::stderr(), $fmt, $($arg)*)).unwrap());
}

// Examples:
warn!("This goes to standard error");
warn!("Connected to host: {}", hostname);

This allows you to trivially write to standard error, and it panics if it fails to write to standard error. If it weren’t for this final detail I’d actually submit it as a pull request for Rust itself. For my code, being able to print to the standard filehandles is critical, so crashing if STDERR is closed makes sense, but there are many situations where that is not reasonable.

The next example is the more interesting one, a macro that uses an environment variable at compile time to modify what it does:

macro_rules! debug {
    ($fmt:expr) => (
        match option_env!("HTTPD_DEBUG") {
            None => (),
            Some(_) => warn!($fmt),
        }
    );
    ($fmt:expr, $($arg:tt)*) => (
        match option_env!("HTTPD_DEBUG") {
            None => (),
            Some(_) => warn!($fmt, $($arg)*),
        }
    );
}

// Examples:
debug!("This goes to standard error");
debug!("Connected to host: {}", hostname);

debug! works just like warn!, but if the HTTPD_DEBUG environment variable is unset at compile time it is as if nothing was even written. Sorta handy, but what’s more important is the general pattern.

I hope to be blogging more about Rust in the future. I hope this helps!

Posted Sat, Feb 6, 2016

Checking sudoers with visudo in SaltStack

At work we are migrating our server deployment setup to use SaltStack. One of the things we do at deploy time is generate a sudoers file, but as one of our engineers found out, if you do not verify the contents of the sudoers file before deploying it you will be in a world of hurt.

Salt actually has a pretty good built in tool for this, but it’s very poorly documented. This is one of the most obvious uses for it and because Googling for it didn’t work for me I figured I’d make it work for someone else.

The feature is the check_cmd flag on file.managed. The current documentation for the feature is:

The specified command will be run with the managed file as an argument. If the command exits with a nonzero exit code, the state will fail and no changes will be made to the file.

This isn’t super clear. It takes the generated content, puts it in a tmpfile, runs the command + the tmpfile path, and then replaces the real contents with the tmpfile. So here is how I used it to verify sudoers

sudo.config_file:
  file.managed:
    - name: {{ sudo.config_file.name }}
    - user: root
    - group: root
    - mode: 0440
    - source: {{ sudo.config_file.source }}
    - template: {{ sudo.config_file.template }}
    - check_cmd: /usr/sbin/visudo -c -f
    - require:
      - pkg: sudo
      - group: sudo

Hope this helps!

Posted Thu, Jan 14, 2016

Pong for Pico-8

I originally wrote this for the Pico-8 Fanzine but it was sadly not accepted. I still had a lot of fun writing in a totally different style than usual. Imagine the following has been printed out, scanned, and reprinted maybe five times.

Pico-8 is a “fantasy console.” It’s a reimagined 8-bit console sorta like the Commadore 64 but with Lua as the primary language instead of BASIC. It’s very fun to play with and I think anyone interested in making games would do well to get it, even if it’s nothing like real life games. It takes away the superficial hurdles and lets you just build a game. Anyway, without further ado, my article:

-- pong
--   <3 frew

-- this is a simple pong game
-- written to learn pico-8 and
-- basic game programming.

-- the structure should be
-- fairly easy to understand,
-- but i'll write out some
-- notes in these comments to
-- help others learn.

------------------------------

-- first off, we have the
-- following two "player" or
-- "paddle" objects.  they have
-- six members each:
--
--  x      - the column
--  y      - the row
--  h      - the height
--  score
--  ai     - computer controlled
--  player - which player

l = {
  x      =  5,
  y      = 50,
  h      = 10,
  score  =  0,
  ai     = true,
  player = 1
}

r = {
  x      = 123,
  y      = 50,
  h      = 10,
  score  =  0,
  ai     = true,
  player = 0
}

-- this is the first really
-- interesting piece of code.
-- for a given player, it will
-- move the player up or down
-- if the ball is not directly
-- across from the center.
--
-- you could improve this code
-- in a few easy ways.  first
-- off, you could make it try
-- to hit the ball with the
-- edge of the paddle, which
-- is harder to anticipate.
-- you could also add some code
-- to make it move more
-- gracefully.  finally, you
-- could make it worse, so that
-- the player actually has a
-- chance!
function do_ai(p, b)
  if (b.y < p.y + p.h/2) then
     p.y -= 1
  elseif (b.y > p.y + p.h/2) then
     p.y += 1
  end
end

-- this is pretty obvious code,
-- except for one part.  the
-- main bit just moves the
-- piece up or down based on
-- the button pressed.  but it
-- additionally maintains the
-- 'ai' member of the player,
-- and automatically calls the
-- do_ai() function above if
-- the player is still an ai.
--
-- it might be fun to add a
-- button that would turn the
-- ai back on after a player
-- took over for the ai.
function update_player(p, b)
  if (btn(2, p.player) or btn(3, p.player)) then
    p.ai = false
  end

  if (not p.ai) then
    if (btn(2, p.player)) p.y -= 1
    if (btn(3, p.player)) p.y += 1
  else
    do_ai(p, b)
  end
end

-- not too complicated, move
-- the ball up and over in the
-- direction it is moving.
function update_ball(b)
  b.x += b.dx
  b.y += b.dy
end

-- this function just puts the
-- ball back in the middle
-- after a point is scored.
middle = r.y + r.h/2
function reset_ball(b)
  b.x  = 50
  b.y  = middle
  b.h  = 2
  b.dx = 1
  b.dy = 0
end

-- and we call it at the start
-- of the game too.
b = {}
reset_ball(b)

-- this is a pretty complex
-- function, but the code is
-- not that hard to understand.
function intersection(l, r, b)
  -- calc_angle will be true
  -- if a player hit the ball.
  calc_angle = false
  -- and p will be set to which
  -- player hit the ball.
  p = {}

  -- ball passed left paddle
  if (b.x < 0) then
     r.score += 1
     reset_ball(b)
  -- ball passed right paddle
  elseif (b.x > 128) then
     l.score += 1
     reset_ball(b)
  -- ball hit ceiling or floor
  elseif (
    b.y < 0 or b.y > 128) then
     b.dy = -b.dy
  -- ball hit left paddle
  elseif (b.x < l.x and
      b.y >= l.y - b.h and
      b.y <= l.y + l.h + b.h
     ) then
     b.dx = -b.dx
     calc_angle = true
     p = l
  -- ball hit right paddle
  elseif (b.x > r.x and
      b.y >= r.y - b.h and
      b.y <= r.y + r.h + b.h
     ) then
     b.dx = -b.dx
     calc_angle = true
     p = r
  end

  if (calc_angle) then
     -- every now and then
     -- increase ball speed
     if (rnd(1) > 0.9) then
       b.dx *= 1 + rnd(0.01)
     end

     -- this is complicated!
     -- the first line scales
     -- the location that the
     -- ball hit the paddle
     -- from zero to one.  so
     -- if the ball hit the
     -- paddle one third of the
     -- way from the top, it
     -- will be set to
     -- circa 0.3
     rl = (b.y - p.y)/p.h
     
     -- this basically makes it
     -- as if the paddle were
     -- part of a circle, so
     -- that bouncing off the
     -- middle is flat, the top
     -- is a sharp angle, and
     -- the bottom is a sharp
     -- angle.  i had to look
     -- up sin and cosine for
     -- this, but it might be
     -- just as easy to play
     -- with the numbers till
     -- you get what you want
     rl = rl / 2 + 0.25
     angle = sin(rl)

     b.dy = angle
     
     -- boop
     sfx(0)
  end
end

-- call all functions above
function _update()
  update_player(l, b)
  update_player(r, b)
  update_ball(b)

  intersection(l, r, b)
end

-- this is pong, everything
-- is basically a square :)
function drawshape(s)
  rectfill(s.x  , s.y    ,
           s.x+2, s.y+s.h, 7 )
end

function _draw()
  cls()
  drawshape(l)
  drawshape(r)
  drawshape(b)

  -- draw the dotted line in
  -- the middle of the field
  for i=0,30 do
    rectfill(64  , i*5  ,
             64+2, i*5+2,  7)
  end

  print(l.score, l.x + 5, 5)
  print(r.score, r.x - 5, 5)
end

Here is the actual catridge; the code is embedded in the image:

pong

Posted Wed, Dec 23, 2015

Farewell, CPAN Contest

In August I write about being tired of The CPAN Contest. I decided recently that once I hit 200 releases I’d stop and put my efforts elsewhere.

I am not giving up on CPAN or Perl; but I do not think timeboxed releases are best for individuals. Though I am very pleased to be able to write, test, and document a new CPAN module over the course of a couple hours.

Looking Back

Now seems like a good time to look back on the past few years; both before the contest and during.

Here are some modules that I released before the contest started:

I also wrote over a hundred blog posts; some classics are:

And I did some other unreleased work, like:

Not bad! Here are some modules that I released during the contest:

And maybe one of the most interesting OSS things I’ve ever done: drinkup.

How The Sausage is Made / Thanks

There are a number of tools that make the overall process of releasing new or updated modules as simple as possible. A few spring to mind:

Dist::Zilla

Rik’s Dist::Zilla was by and large the most motivating and generally helpful tool in this process. No other tools even come close to providing the build time assistance that Dist::Zilla does. I remember when I released my very first CPAN module being incredibly intimidated by Module::Install (which I think I can look back on as a kind of lucky guess.) The version that I used was recent for the time, but four major versions have been released since then!

On top of dzil I use a number of plugins, though not a huge amount. If you want to see a definitive list, my current kit is shown here.

Github and a constellation of tools surrounding it

I have released open source code on a bunch of platforms. Until just now I’d never really considered how many. I’ve used all of

  • Sourceforge
  • Rubyforge
  • A Blog Post Containing All The Code
  • Google Code
  • Savannah
  • Github

I remember when I signed up for Savannah they told me: “How about you write your code first, and then you can host it here.” What a joke.

It’s crazy how many of those services are just gone now!

When I started using Github they didn’t even have issues, you had to use an ascillary service called Lighthouse. Anyway, Github provides a lot of awesome features but mostly for me it boils down to:

  • I can create repos, forks, issues etc from the CLI (Using git hub)
  • I can easily see my personal “todo list” at https://github.com/issues

The former means that I don’t have to deal with a bloated browser or web interface because I do this stuff so often. In fact, when I come up with an idea for a new project my current process is:

git hub repo-new frioux/My-Idea && git hub issue-new frioux/My-Idea

And then it’ll show up in that central place. Pretty cool huh?

Testing

Many have reasonably noted that CPAN Testers is one of the few things that Perl has and no other community has yet to emulate. While that’s true, for the vast majority of people, actually testing on different platforms is overkill. For most of my modules, I pay more attention to TravisCI, as it will test all major versions of Perl every time I push. Before each release I wait for travis tests to finish just in case I missed some odd Perl 5.8 thing.

On top of that, I have a powerful Docker setup for DBIx::Class::Helpers that actually runs live tests against all of SQLite, mysql, PostgreSQL, and Oracle. If you care to, you can even set environment variables to point at a SQL Server instance as well, but I don’t do that and I suspect no one else does either.

What’s next?

I want to spend less time on libraries and more time on applications, for one. It would be great if I were able to finally finish and use drinkup, though as a parent I no longer have the time to really focus on cocktails like I used to.

I want to make some video games.

I want to get back to blogging on a weekly basis, whether the Iron Man software ever works or not.

I want to play more with weird languages like Rust and OCaml.

Most of all, I want to enjoy my limited free time. If I do decide to write a module and publish it; great, but I don’t want it to be a chore. I’d say most of the time when I release a new module it is fun and maybe at least a tiny bit useful, but there are plenty of times when I’ve had to scrabble to come up with something to release.

Posted Wed, Dec 16, 2015

PID Namespaces in Linux

One of the tools I wrote shortly after joining ZipRecruiter is for managing a Selenium test harness. It’s interesting because there are a lot of constraints related to total capacity of the host, desired speed of the test suite, and desired correctness of the codebase.

Anyway one of the major issues that I found was if I stopped a test prematurely (with Ctrl-C, which sends a SIGINT) I’d end up with a bunch of orphaned workers. My intial idea was to just forward along any signal that the process received to the child workers (minus some obvious ones like CHLD and WINCH) but that ended up causing problems, because the workers had many children of their own and they did not handle the situation correctly either.

There are a couple ways to do this. The first way is I think portable across unices; this involves giving the main process a TTY of it’s own. This will send all child processes (recursively, I think) a SIGHUP (as in the TTY hung up) when the main process exits. Here’s the code, it’s pretty easy to do in Perl, though I have not gone through the effort to figure out how to do it with vanilla shell.

use IO::Pty;

my $pty = IO::Pty->new;
$pty->make_slave_controlling_terminal;

The other way, which you may have guessed if you read the title of this post, is using a Linux PID namespace. In a PID namespace you basically start a process and it sees itself as PID 1 (aka init). All child processes are in the namespace as well and will have similarly “low” PIDs themselves. This is not really interesting for our use case. The interesting thing is, if PID 1 of a Linux PID namespace exits, all children get a SIGKILL. Unlike SIGHUP, SIGKILL cannot be ignored, and the processes will definitely go away, and it will be immediate (unless they are in uninteruptable sleep I guess.)

PID namespaces have been around for like, forever (2008, which at this point is nearly eight years.) The problem is you can only create one as root, which is a hassle to say the very least. Now if you create a user namespace you do not need root, but that requires Linux 3.8, which is from 2013; pretty recent! Here’s an example of how to start a program in a PID namespace:

unshare --pid --user --mount --mount-proc --fork ./my-app

The first two flags should be obvious. The mount related flags are so that the processes inside of the namespace can read from /proc and find out about whatever details they might need to know. If you are sure your processes never read from /proc you can safely elide those flags. The fork flag is because creating the pid namespace around a running process doesn’t really work.

unshare comes from the util-linux package, which any real (read: non-embedded) Linux distro ships with. To be clear: the above command is really light weight. The meat of it clocks in at two system calls (unshare and clone.) The whole thing adds about 5ms to runtime. I think of unshare as a much more powerful fork. At some point I would like to make using it from within a language as easy as fork.

If you want to try this out, I’ve written more including some test scripts on github.

Posted Wed, Nov 25, 2015

Dream On Dreamer

I can’t speak for others, but I was pretty inspired as a teenager. What I’d do is read random stuff throughout the week, then listen to some kind of music or watch a movie on Friday, and do my best to stay up all night and use what I’d learned to make something new.

For the most part, as a teenager, I failed. As with most teenagers, I was pretty much worthless. But that’s part of what ended up making me who I am today!

I’m tired of using my free time feeling like I should be Accomplishing Something of Value to the Corporate World. I have six weeks to go before reaching 200 weeks in The CPAN Contest. That seems like a good time to stop.

After that, I expect to write more blog posts. The blog posts will be weird, maybe color coded. They will likely be less about code and more about Buckethead or A Really Good Movie.

Are you inspired? What inspires you? Why aren’t you inspired?

Dream on dreamer, starchild
Open your mind carefully
Dream on dreamer, ride home
Dreamshades changing endlessly

Enter worlds no one has seen
All at your command in your dreams

Sinking deeper, letting go
Let the story take you in
Dream on dreamer, fly home
Let the dream of dreams begin

Enter worlds no one has seen
All at your command in your dreams
Posted Sat, Nov 21, 2015

How I Integrated my blink(1) with PulseAudio

At work I wear some noise cancelling ear buds. I do this because just twenty feet behind me there is a one hundred person sales team who sometimes claps, ring gongs, and is just generally loud. I also like to work to music and it helps me focus.

My other coworkers all use large headphones, so they are used to being able to see at a glance if a given individual is listening to music. I thought it would be cool if I made a way to show that I was actually listening to something, so I wrote the following little script:

#!/usr/bin/perl

use strict;
use warnings;

while (1) {
   sleep 1;

   if (playing_sounds()) {
      warn "red\n";
      system 'blink1-tool', '--red';
      $off_count = 0;
   } else {
      warn "black\n";
      system 'blink1-tool', '--off';
   }
}

sub playing_sounds {
   my @lines =
      grep m/RUNNING/,
      split /\n/,
      qx(pacmd list-sink-inputs);

   warn "sound is playing\n" if @lines;
   warn "silence\n" if !@lines;

   scalar @lines
}

This very lightweight perl script of 30 lines simply makes my blink(1) red if any sound is playing on my machine, and turns it back off when there is none.

It works amazingly well and I think it is exactly why it’s awesome to be a software engineer. I would like to have red imply sound and do something else with green and blue, but I do not yet have ideas for what those could imply. I was considering making green come on if someone sshes into my machine but that’s crazy unlikely 😆. Pretty cool eh?

Posted Tue, Nov 17, 2015

Fast CLI Tools: gmail

I have been using commandline tools to interact email for quite a while now. Basically there were two reasons:

  • I wanted to use GnuPG
  • gmail’s web interface became too slow

The former should be obvious; attempting to have secure communications in the context of a web browser is laughable.

The latter often surprises people. I think that if you pay a little more attention you’ll notice that gmail is clearly slower than local options. Not all local options, but the ones I’ll be discussing. 😄

For example, just loading gmail fresh takes 10s. Loading mutt takes less than 0.5s. In my email, searching for “station” takes 4s on the web interface. Using notmuch locally takes less than 2s, and of course since it’s a local machine, the second search is in a cache and thus is less than half that. On top of all of that, unlike modern web browsers, mutt never ends up taking up gigs of my memory.

Here are the tools that I use:

OfflineIMAP

My OfflineIMAP setup is fairly complex, because I’ve found that OfflineIMAP is a little bit buggy. You can read more about that on my OfflineIMAP Docker page. I’m very proud of this setup, but it still has a way to go before it’s as good as I want it to be.

Mutt

The main thing I’m pleased about with mutt, aside from the fact that I can and do trivialy use vim as my editor, is the integration with notmuch. This mostly replaces all of the stuff I “lose” when not using gmail. So for example I can press F8 to search directly from within mutt and get a threaded view of the results. Similarly, if I press F9 I get a complete thread of the current email. To be clear, if I archive some of the messages in mutt, the thread will be incomplete or even broken, and almost never will my messages be shown. This resolves that “lack.”

notmuch

I mostly went over what notmuch buys you in the Mutt section. I am a huge fan of notmuch. People need to factor out simple tools like this more often. Good job notmuch humans, I love your work.

Contact Sync Tools

This set of tools is actually why I’m writing this post. Before today, I would either use goobook integrated with mutt, or addrlookup (via my addrlookup-compat) on the console. First off: goobook is slow. If you reload your contacts (which happens automatically at least every 24h) you will be waiting for about 4s in a “tab complete.” 4s is too long for anything interactive. On top of that addrlookup can be similarly slow and even more: using two separate tools is annoying! Even searching, locally mind you, in my 150 entry address book with goobook query takes more than a second. Python programmers: do better.

So today I resolved these speed issues. First off, I just use an hourly cronjob (could likely be daily but my computer is rarely on all day and this isn’t resource intensive so it seemed easier) to export all of the contacts addrlookup finds and then concatenate the contacts that goobook lists into a flat file. I wrote the smallest tool ever to filter that file; basically it’s just grep though.

It’s of course super fast; currently taking 5ms for a query. That’s well within my requirements.

Oh, and because goobook is not packaged for ubuntu, I made a nice docker container for it. This container is the first that I’ve made which uses Alpine Linux, and I am impressed. The Ubuntu version would have been 300M, where the Alpine version is a mere 60M.


So that’s that! I have well integrated, very fast, commandline tools for all of my email needs. I can use all of these tools while disconnected from the internet and they are faster than what google can provide. I hope this helps you in your speedy endevours.

colophon

This article was written in Santa Monica with vim and the excellent Goyo plugin.

Posted Sun, Nov 1, 2015