Prometheus Conveniences

At ZipRecruiter we are working towards migrating to a Prometheus as a more modern monitoring solution. I have found it pretty pleasant, so far.

Prometheus is a monitoring and alerting tool (or maybe more a suite of tools.) Since I’ve been at ZR we’ve used a combination of Icinga (to periodically poll things) and statsd (one of the older TSDBs) for alerting. Testing the old setup was frustrating since the software is complicated managed by even more complicated software. I don’t really want to get into how they work so I’ll just discuss how Prometheus works.

For starters, if you are writing a threaded language, like Go or Java, you expose the metrics you would to Prometheus over an HTTP handler. If you want to verify that your metrics are being exposed as you intend you either point your web browser (or curl) at http://localhost:8080/metrics. Already this is great; in the past we had engineers implement a variety of ways to capture the stats that our apps would send so that they could ensure things were working as intended.

If you are using a process oriented language, like Perl or Python, you have more work to do. Chances are you will use either statsd_exporter or maybe the Prometheus pushgateway. I haven’t interacted with those as much myself, but I am aware of how they run and again, they are trivial to run locally so you can play with them and get comfortable with how things are functioning.

Next you probably want to experiment with writing alerts. There are two ways to do this; the first, which I would suggest, is to just run Prometheus locally, scraping your app, and experimenting with alertrules directly. It’s a single go tool with a basic web interface that you can use to see how things are functioning. If you want to do something more complex or you don’t want to somehow break your app to see your alerts fire, you can use promtool, which ships with Prometheus, to test various inputs to queries to ensure that they would fire as intended.

It’s interesting to me that in this world where things are migrating to HTTP/2, where nothing is simple anymore, that such a simple monitoring system is flourishing. I for one find it refreshing.

I hope to write about our automation around deploying new alertrules soon, but even before we get there I am very pleased with how this is shaping up.

(The following includes affiliate links.)

If you want to learn more about prometheus, you might check out Prometheus: Up & Running.

Another option, which I have only glanced at so far, is Monitoring with Prometheus.

I have only spent a little time glancing at these two books and both of them have good stuff in them.

Posted Wed, Apr 10, 2019

If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.