A visit to the Workshop: Hugo/Unix/Vim integration

I write a lot of little tools and take pride in thinking of myself as a toolsmith. This is the first post of hopefully many specifically highlighting the process of the creation of a new tool.

I wanted to do some tag normalization and tag pruning on my blog, to make the tags more useful (eg instead of having all of dbic, dbix-class, and dbixclass just pick one.) Here’s how I did it.

As mentioned previously this blog is generated by Hugo. Hugo is excellent at generating static content; indeed that is it’s raison d’être. But there are places where it does not do some of the things that a typical blogging engine would.

To normalize tags I wanted to look at tags with their counts, and then associated filenames for a given tag. If I were using WordPress I’d navigate around the web interface and click edit and this use case would be handled. Not for me though, because I want to avoid the use of my web browser if at all possible. It’s bloated, slow, and limited.

🔗 Anatomy of an Article

Before I go much further here is a super quick primer on what an article looks like in hugo:

---
aliases: ["/archives/984"]
title: "Previous Post Updated"
date: "2009-07-24T00:59:37-05:00"
tags: ["book", "catalyst", "perl", "update"]
guid: "http://blog.afoolishmanifesto.com/?p=984"
---
Sorry about that guys, I didn't use **links** to make it clear which book I was
talking about. Usually I do that kind of stuff but the internet was sucky
(fixed!) so it hurt to look up links. Enjoy?

The top part is YAML. Hugo supports lots of different metadata formats but all of my posts use YAML. The part after the --- is the content, which is simply markdown.

🔗 Unix Style Tools

My first run at this general problem was to build a few simple tools. Here’s the one that would extract the metadata:

#!/usr/bin/env perl

use 5.22.0;
use warnings;
use autodie;

for my $file (@ARGV) {
  open my $fh, '<', $file;
  my $cnt = 0;
  while (<$fh>) {
    $cnt ++ if $_ eq "---\n";
    print $_ if $cnt < 2
  }
}

The above returns the YAML part, which can then be consumed by a tool with a YAML parser.

Then I built a tool on top of that, called tag-count:

#!/usr/bin/env perl

use 5.22.0;
use warnings;

use sort 'stable';

use experimental 'postderef';

use YAML;

my $yaml = `bin/metadata content/posts/*`;
my @all_data = Load($yaml);

my @tags = map(($_->{tags}||[])->@*, @all_data);
my %tags;

$tags{$_}++ for @tags;

for (sort { $tags{$b} <=> $tags{$a} } sort keys %tags) {
   printf "%3d $_\n", $tags{$_}
}

That works, but it’s somewhat inflexible. When I thought about how I wanted to get the filenames for a given tag I decided I’d need to modify the metadata script, or make the calling script a lot more intelligent.

🔗 Advanced Unix Tools

So the metadata extractor turned out to be too simple. At some point I had the realization that what I really wanted was a database of data about my posts that I could query with SQL. Tools built on top of that would be straightforward to build and their function would be clear.

So I whipped up what I call q:

#!/usr/bin/env perl

use 5.22.0;
use warnings;
use autodie;
use experimental 'postderef';

use DBI;
use File::Find::Rule;
use Getopt::Long;
my $sql;
my $formatter;
GetOptions (
   'sql=s' => \$sql,
   'formatter=s' => \$formatter,
) or die("Error in command line arguments\n");

use YAML::XS 'Load';

# build schema
my $dbh = DBI->connect('dbi:SQLite::memory:', {
      RaiseError => 1,
});

$dbh->do(<<'SQL');
   CREATE TABLE articles (
      title,
      date,
      guid,
      filename
   )
SQL

$dbh->do(<<'SQL');
   CREATE TABLE article_tag ( guid, tag )
SQL

$dbh->do(<<'SQL');
   CREATE VIEW _ ( guid, title, date, filename, tag ) AS
   SELECT a.guid, title, date, filename, tag
   FROM articles a
   JOIN article_tag at ON a.guid = at.guid
SQL

# populate schema
for my $file (File::Find::Rule->file->name('*.md')->in('content')) {
  open my $fh, '<', $file;
  my $cnt = 0;
  my $yaml = "";
  while (<$fh>) {
    $cnt ++ if $_ eq "---\n";
    $yaml .= $_ if $cnt < 2
  }
  my $data = Load($yaml);
  $data->{tags} ||= [];

  $dbh->do(<<'SQL', undef, $data->{guid}, $data->{title}, $data->{date}, $file);
      INSERT INTO articles (guid, title, date, filename) VALUES (?, ?, ?, ?)
SQL

  $dbh->do(<<'SQL', undef, $data->{guid}, $_) for $data->{tags}->@*;
      INSERT INTO article_tag (guid, tag) VALUES (?, ?)
SQL
}

# run sql
my $sth = $dbh->prepare($sql || die "pass some SQL yo\n");
$sth->execute(@ARGV);

# show output
for my $row ($sth->fetchall_arrayref({})->@*) {
   my $code = $formatter || 'join "\t", map $r{$_}, sort keys %r';
   say((sub { my %r = $_[0]->%*; eval $code })->($row))
}

With less than 80 lines of code I have a super flexible tool for querying my corpus! Here are the two tools mentioned above, as q scripts:

bin/tag_count:

#!/bin/dash

exec bin/q \
   --sql 'SELECT COUNT(*) AS c, tag FROM _ GROUP BY tag ORDER BY COUNT(*), tag' \
   --formatter 'sprintf "%3d  %s", $r{c}, $r{tag}'

bin/tag-files:

#!/bin/dash

exec bin/q --sql "SELECT filename FROM _ WHERE tag = ?" -- "$1"

And then this one, which I was especially pleased with because it was a use case I came up with after building q.

bin/chronological:

#!/bin/dash

exec bin/q --sql 'SELECT filename, title, date FROM articles ORDER BY date DESC' \
      --format 'my ($d) = split /T/, $r{date}; "$r{filename}:1:$d $r{title}"'

I’m pleasantly surprised that this is fast. All of the above take under 150ms, even though the database is not persistent across runs.

🔗 Vim integration

Next I wanted to integrate q into Vim, so that when I wanted to see all posts tagged vim (or whatever) I could easily do so from within the current editor instance instead of spawning a new one.

🔗 :Tagged

To be clear, the simple way, where you spawn a new instance, is easily achieved like this:

$ vi $(bin/tag-files vim)

But I wanted to do that from within vim. I came up with some functions and commands to do what I wanted, but it was fairly painful. Vim is powerful, but it gets weird fast. Here’s how I made a :Tagged vim command:

function Tagged(tag)
  execute 'args `bin/tag-files ' . a:tag . '`'
endfunction
command -nargs=1 Tagged call Tagged('<args>')

:execute is a kind of eval. In vim there are a lot of different execution contexts and each one needs it’s own kind of eval; so this is the Ex-mode eval. :args {arglist} simply sets the argument list. And the magic above is that surrounding a string with backticks causes the command to be executed and the output interpolated, just like in shell or Perl.

I also added a window local version, using :arglocal:

function TLagged(tag)
  exe 'arglocal `bin/tag-files ' . a:tag . '`'
endfunction
command -nargs=1 TLagged call TLagged('<args>')

🔗 :Chrono

I also used the quickfix technique I blogged about before because it comes with a nice, easy to use window (see :cwindow) and I added a caption to each file. I did it for the chronological tool since that ends up being the largest possible list of posts. Making it easier to navigate is well worth it. Here’s the backing script:

#!/bin/dash

exec bin/q --sql 'SELECT filename, title, date FROM articles ORDER BY date DESC' \
           --format 'my ($d) = split /T/, $r{date}; "$r{filename}:1:$d $r{title}"'

and then the vim command is simply:

command Chrono cexpr system('bin/quick-chrono')

🔗 :TaggedWord

Another command I added is called :TaggedWord. It takes the word under the cursor and loads all of the files with that tag into the argument list. If I can figure out how to bake it into CTRL-] (or something else like it) I will, as that would be more natural.

function TaggedWord()
  " add `-` as a "word" character
  set iskeyword+=45
  " save the current value of the @m register
  let l:tmp = @m
  normal "myiw
  call Tagged(@m)
  " restore
  set iskeyword-=45
  let @m = l:tmp
endfunction
command TaggedWord call TaggedWord()

I also made a local version of that, but I’ll leave the definition of that one to the reader as an exercise.

🔗 Tag Completion

As a final cherry on top I added a completion function for tags. This is probably the most user-friendly way I can keep using the right tags. When I write a post, and start typing tags, existing tags will autocomplete and thus will be more likely to be selected than to be duplicated. It’s not perfect, but it’s pretty good. Here’s the code:

au FileType markdown execute 'setlocal omnifunc=CompleteTags'
function! CompleteTags(findstart, base)
  " This is almost purely cargo culted from the vim doc
  if a:findstart
    let line = getline('.')
    let start = col('.') - 1
    " tags are word characters and -
    while start > 0 && line[start - 1] =~ '\w\|-'
      let start -= 1
    endwhile
    return start
  else
    " only run the command if we are on the "tags: [...]" line
    if match(getline('.'), "tags:") == -1
      return []
    endif

    " get list of tags that have current base as a prefix
    return systemlist('bin/tags ' . a:base . '%')
  endif
endfun

And here’s the referenced bin/tags:

#!/bin/dash

match="${1:-%}"
bin/q --sql 'SELECT tag FROM article_tag WHERE tag LIKE ? GROUP BY tag' -- "$match"

This little excursion was a lot of fun for me. I’ve always thought that Vim’s completion was black magic, but it’s really not. And the lightbulb moment about building an in memory SQLite database was particularly rewarding. I hope I inspired readers to write some tools as well; go forth, write!

(The following includes affiliate links.)

If you’d like to learn more about vim, I can recommend two excellent books. I first learned how to use vi from Learning the vi and Vim Editors. The new edition has a lot more information and spends more time on Vim specific features. It was helpful for me at the time, and the fundamental model of vi is still well supported in Vim and this book explores that well.

Second, if you really want to up your editing game, check out Practical Vim. It’s a very approachable book that unpacks some of the lesser used features in ways that will be clearly and immediately useful. I periodically review this book because it’s such a treasure trove of clear hints and tips.

Posted Wed, Jul 20, 2016

If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.