Unreliable Cronjobs

At work we’ve been working on monitoring our cronjobs better; armed with some of the knowledge of how to do this I have made some incredibly unreliable cronjobs much more reliable.

The general pattern that we suggest at work is to run your cronjobs five to ten times more often than you need to and to exit early if there is no work to do. In addition, you should monitor what the cronjob produces (which obviously varies wildly, per cronjob) rather than the sythentic exit code or output from the cronjob. This can both help you to avoid being paged when a cronjob is a little flaky and additionally detect a cronjob that is failing but still exiting zero.

Given this knowledge I decided to apply it to the least reliable host I know of: my laptop. I have a handful of cronjobs that I want to succeed either hourly or daily, but I can’t assume my laptop will be running at any given time of day. Here’s the pattern I settled on: for jobs that should succeed daily, run hourly; for jobs that should succeed hourly, run them every minute. Nearly all my jobs produce a file on disk, so I add a little header to each job:

older-than "$OUTPUT_FILE" m 1h || exit

That uses older-than, which allows basic time expressions against a file system time (atime, mtime, or ctime.)

Ok, now the cronjobs are more reliable because they run more often and have more chances to succeed. But what about cronjobs that are broken because, for example, some auth token rotated and I never noticed? I have a stupid system for checking these files and notifying me if they are too old. Basically I run this script every in a while loop every 5 minutes. The while loop is started when I log in, and if it died, for some reason, I would be blind, but that is pretty unlikely.

One of my scripts will periodically sync addresses to a local thing mutt can read. I don’t need it to be super up-to-date, but if it’s been broken for a couple days I want to know so it’s not six months till I notice. Here’s my notification command:

older-than m 2d "$HOME/personal-addresses" || \
   notify-send -u critical "personal-addresses is too old"

This will put a little red notification at the top right of my screen, and because it’s “critical” it won’t go away till I click it.

If you wanna glue together little things like the above, you might be interested in Wicked Cool Shell Scripts. I never know if books are too advanced or too basic, but check it out; maybe it’s your speed.

A related topic, to me, is extending my editor. If you want to go all the way with that and, like me, use Vim, you might want to grab a copy of Learn Vimscript the Hard Way.

Posted Tue, Jun 25, 2019

If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.