A visit to the Workshop: Hugo/Unix/Vim integration
I write a lot of little tools and take pride in thinking of myself as a toolsmith. This is the first post of hopefully many specifically highlighting the process of the creation of a new tool.
I wanted to do some tag normalization and tag pruning on my blog, to make the
tags more useful (eg instead of having all of dbic
, dbix-class
, and
dbixclass
just pick one.) Here’s how I did it.
As mentioned previously this blog is generated by Hugo. Hugo is excellent at generating static content; indeed that is it’s raison d’Γͺtre. But there are places where it does not do some of the things that a typical blogging engine would.
To normalize tags I wanted to look at tags with their counts, and then associated filenames for a given tag. If I were using WordPress I’d navigate around the web interface and click edit and this use case would be handled. Not for me though, because I want to avoid the use of my web browser if at all possible. It’s bloated, slow, and limited.
π Anatomy of an Article
Before I go much further here is a super quick primer on what an article looks like in hugo:
---
aliases: ["/archives/984"]
title: "Previous Post Updated"
date: "2009-07-24T00:59:37-05:00"
tags: ["book", "catalyst", "perl", "update"]
guid: "http://blog.afoolishmanifesto.com/?p=984"
---
Sorry about that guys, I didn't use **links** to make it clear which book I was
talking about. Usually I do that kind of stuff but the internet was sucky
(fixed!) so it hurt to look up links. Enjoy?
The top part is YAML. Hugo supports lots of different metadata formats but all
of my posts use YAML. The part after the ---
is the content, which is simply
markdown.
π Unix Style Tools
My first run at this general problem was to build a few simple tools. Here’s the one that would extract the metadata:
#!/usr/bin/env perl
use 5.22.0;
use warnings;
use autodie;
for my $file (@ARGV) {
open my $fh, '<', $file;
my $cnt = 0;
while (<$fh>) {
$cnt ++ if $_ eq "---\n";
print $_ if $cnt < 2
}
}
The above returns the YAML part, which can then be consumed by a tool with a YAML parser.
Then I built a tool on top of that, called tag-count
:
#!/usr/bin/env perl
use 5.22.0;
use warnings;
use sort 'stable';
use experimental 'postderef';
use YAML;
my $yaml = `bin/metadata content/posts/*`;
my @all_data = Load($yaml);
my @tags = map(($_->{tags}||[])->@*, @all_data);
my %tags;
$tags{$_}++ for @tags;
for (sort { $tags{$b} <=> $tags{$a} } sort keys %tags) {
printf "%3d $_\n", $tags{$_}
}
That works, but it’s somewhat inflexible. When I thought about how I wanted to get the filenames for a given tag I decided I’d need to modify the metadata script, or make the calling script a lot more intelligent.
π Advanced Unix Tools
So the metadata extractor turned out to be too simple. At some point I had the realization that what I really wanted was a database of data about my posts that I could query with SQL. Tools built on top of that would be straightforward to build and their function would be clear.
So I whipped up what I call q
:
#!/usr/bin/env perl
use 5.22.0;
use warnings;
use autodie;
use experimental 'postderef';
use DBI;
use File::Find::Rule;
use Getopt::Long;
my $sql;
my $formatter;
GetOptions (
'sql=s' => \$sql,
'formatter=s' => \$formatter,
) or die("Error in command line arguments\n");
use YAML::XS 'Load';
# build schema
my $dbh = DBI->connect('dbi:SQLite::memory:', {
RaiseError => 1,
});
$dbh->do(<<'SQL');
CREATE TABLE articles (
title,
date,
guid,
filename
)
SQL
$dbh->do(<<'SQL');
CREATE TABLE article_tag ( guid, tag )
SQL
$dbh->do(<<'SQL');
CREATE VIEW _ ( guid, title, date, filename, tag ) AS
SELECT a.guid, title, date, filename, tag
FROM articles a
JOIN article_tag at ON a.guid = at.guid
SQL
# populate schema
for my $file (File::Find::Rule->file->name('*.md')->in('content')) {
open my $fh, '<', $file;
my $cnt = 0;
my $yaml = "";
while (<$fh>) {
$cnt ++ if $_ eq "---\n";
$yaml .= $_ if $cnt < 2
}
my $data = Load($yaml);
$data->{tags} ||= [];
$dbh->do(<<'SQL', undef, $data->{guid}, $data->{title}, $data->{date}, $file);
INSERT INTO articles (guid, title, date, filename) VALUES (?, ?, ?, ?)
SQL
$dbh->do(<<'SQL', undef, $data->{guid}, $_) for $data->{tags}->@*;
INSERT INTO article_tag (guid, tag) VALUES (?, ?)
SQL
}
# run sql
my $sth = $dbh->prepare($sql || die "pass some SQL yo\n");
$sth->execute(@ARGV);
# show output
for my $row ($sth->fetchall_arrayref({})->@*) {
my $code = $formatter || 'join "\t", map $r{$_}, sort keys %r';
say((sub { my %r = $_[0]->%*; eval $code })->($row))
}
With less than 80 lines of code I have a super flexible tool for querying my
corpus! Here are the two tools mentioned above, as q
scripts:
bin/tag_count
:
#!/bin/dash
exec bin/q \
--sql 'SELECT COUNT(*) AS c, tag FROM _ GROUP BY tag ORDER BY COUNT(*), tag' \
--formatter 'sprintf "%3d %s", $r{c}, $r{tag}'
bin/tag-files
:
#!/bin/dash
exec bin/q --sql "SELECT filename FROM _ WHERE tag = ?" -- "$1"
And then this one, which I was especially pleased with because it was a use case
I came up with after building q
.
bin/chronological
:
#!/bin/dash
exec bin/q --sql 'SELECT filename, title, date FROM articles ORDER BY date DESC' \
--format 'my ($d) = split /T/, $r{date}; "$r{filename}:1:$d $r{title}"'
I’m pleasantly surprised that this is fast. All of the above take under 150ms, even though the database is not persistent across runs.
π Vim integration
Next I wanted to integrate q
into Vim, so that when I wanted to see all posts
tagged vim
(or whatever) I could easily do so from within the
current editor instance instead of spawning a new one.
π :Tagged
To be clear, the simple way, where you spawn a new instance, is easily achieved like this:
$ vi $(bin/tag-files vim)
But I wanted to do that from within vim. I came up with some functions and
commands to do what I wanted, but it was fairly painful. Vim is powerful, but
it gets weird fast. Here’s how I made a :Tagged vim
command:
function Tagged(tag)
execute 'args `bin/tag-files ' . a:tag . '`'
endfunction
command -nargs=1 Tagged call Tagged('<args>')
:execute is a kind
of eval
. In vim there are a lot of different execution contexts and each one
needs it’s own kind of eval; so this is the
Ex-mode eval.
:args {arglist}
simply sets the argument list. And the magic above is that surrounding a string
with backticks causes the command to be executed and the output interpolated,
just like in shell or Perl.
I also added a window local version, using :arglocal:
function TLagged(tag)
exe 'arglocal `bin/tag-files ' . a:tag . '`'
endfunction
command -nargs=1 TLagged call TLagged('<args>')
π :Chrono
I also used the quickfix technique I blogged about before because it comes with a nice, easy to use window (see :cwindow) and I added a caption to each file. I did it for the chronological tool since that ends up being the largest possible list of posts. Making it easier to navigate is well worth it. Here’s the backing script:
#!/bin/dash
exec bin/q --sql 'SELECT filename, title, date FROM articles ORDER BY date DESC' \
--format 'my ($d) = split /T/, $r{date}; "$r{filename}:1:$d $r{title}"'
and then the vim command is simply:
command Chrono cexpr system('bin/quick-chrono')
π :TaggedWord
Another command I added is called :TaggedWord
. It takes the word under the
cursor and loads all of the files with that tag into the argument list. If I
can figure out how to bake it into
CTRL-] (or
something else like it) I will, as that would be more natural.
function TaggedWord()
" add `-` as a "word" character
set iskeyword+=45
" save the current value of the @m register
let l:tmp = @m
normal "myiw
call Tagged(@m)
" restore
set iskeyword-=45
let @m = l:tmp
endfunction
command TaggedWord call TaggedWord()
I also made a local version of that, but I’ll leave the definition of that one to the reader as an exercise.
π Tag Completion
As a final cherry on top I added a completion function for tags. This is probably the most user-friendly way I can keep using the right tags. When I write a post, and start typing tags, existing tags will autocomplete and thus will be more likely to be selected than to be duplicated. It’s not perfect, but it’s pretty good. Here’s the code:
au FileType markdown execute 'setlocal omnifunc=CompleteTags'
function! CompleteTags(findstart, base)
" This is almost purely cargo culted from the vim doc
if a:findstart
let line = getline('.')
let start = col('.') - 1
" tags are word characters and -
while start > 0 && line[start - 1] =~ '\w\|-'
let start -= 1
endwhile
return start
else
" only run the command if we are on the "tags: [...]" line
if match(getline('.'), "tags:") == -1
return []
endif
" get list of tags that have current base as a prefix
return systemlist('bin/tags ' . a:base . '%')
endif
endfun
And here’s the referenced bin/tags
:
#!/bin/dash
match="${1:-%}"
bin/q --sql 'SELECT tag FROM article_tag WHERE tag LIKE ? GROUP BY tag' -- "$match"
This little excursion was a lot of fun for me. I’ve always thought that Vim’s completion was black magic, but it’s really not. And the lightbulb moment about building an in memory SQLite database was particularly rewarding. I hope I inspired readers to write some tools as well; go forth, write!
(The following includes affiliate links.)
If you’d like to learn more about vim, I can recommend two excellent books. I first learned how to use vi from Learning the vi and Vim Editors. The new edition has a lot more information and spends more time on Vim specific features. It was helpful for me at the time, and the fundamental model of vi is still well supported in Vim and this book explores that well.
Second, if you really want to up your editing game, check out Practical Vim. It’s a very approachable book that unpacks some of the lesser used features in ways that will be clearly and immediately useful. I periodically review this book because it’s such a treasure trove of clear hints and tips.
Posted Wed, Jul 20, 2016If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.