At ZipRecruiter we are working towards migrating to a Prometheus as a more modern monitoring solution. I have found it pretty pleasant, so far.
Prometheus is a monitoring and alerting tool (or maybe more a suite of tools.) Since I’ve been at ZR we’ve used a combination of Icinga (to periodically poll things) and statsd (one of the older TSDBs) for alerting. Testing the old setup was frustrating since the software is complicated managed by even more complicated software. I don’t really want to get into how they work so I’ll just discuss how Prometheus works.
For starters, if you are writing a threaded language, like Go or Java, you
expose the metrics you would to Prometheus over an HTTP handler. If you want to
verify that your metrics are being exposed as you intend you either point your
web browser (or
http://localhost:8080/metrics. Already this is
great; in the past we had engineers implement a variety of ways to capture the
stats that our apps would send so that they could ensure things were working as
If you are using a process oriented language, like Perl or Python, you have more
work to do. Chances are you will use either
statsd_exporter or maybe the
Prometheus pushgateway. I haven’t interacted with those as much myself, but I
am aware of how they run and again, they are trivial to run locally so you can
play with them and get comfortable with how things are functioning.
Next you probably want to experiment with writing alerts. There are two ways to
do this; the first, which I would suggest, is to just run Prometheus locally,
scraping your app, and experimenting with alertrules directly. It’s a single go
tool with a basic web interface that you can use to see how things are
functioning. If you want to do something more complex or you don’t want to
somehow break your app to see your alerts fire, you can use
ships with Prometheus, to test various inputs to queries to ensure that they
would fire as intended.
It’s interesting to me that in this world where things are migrating to HTTP/2, where nothing is simple anymore, that such a simple monitoring system is flourishing. I for one find it refreshing.
I hope to write about our automation around deploying new alertrules soon, but even before we get there I am very pleased with how this is shaping up.
If you want to learn more about prometheus, you might check out Prometheus: Up & Running.
Another option, which I have only glanced at so far, is Monitoring with Prometheus.
I have only spent a little time glancing at these two books and both of them have good stuff in them.Posted Wed, Apr 10, 2019
If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.