Consolidated Alerting Using PagerDuty
Posted on June 19, 2011
There are a lot of interesting SaaS offerings available today but not many that get me all excited. I recently blogged about Duo Security, they get me excited. Another of my favorites is Mint. The most recent has been PagerDuty. I met the PagerDuty team at Velocity & DevOps Days this week and they are a really awesome bunch of folks and so I thought I’d give them a little love to show my support. (This isn’t sponsored, in all my years I’ve never made a dime from cuddletech and I never intend to.)
We all have several things in our infrastructures that alert. You probably have multiple monitoring systems, there is lots of software with built in alerting capabilities, logging systems that alert, even external SaaS such as Pingdom, Circonus, New Relic, Keynote, whatever. Managing all those contacts is a royal PITA. If you have a NOC you can just have them all send alerts into a mail list which someone monitors and escalates as necessary, but if you have a small to midsize team its not realistic to have someone watching a list 24×7.
PagerDuty has a great many features but consolidation is by far the most exciting for me. Within PagerDuty you add each member of your staff as a user and they themselves can add various contact methods which are escalated over time. For instance, you may want an email immediately when an incident occurs, an SMS at +5 minutes, a phone call at +10 minutes, a phone call to your land line +15 minutes, etc. Then you create services which represent different alerting systems. So I have one for Zabbix, another for OpenNMS, another for Pingdom, etc. Each service has its own email address: serviceX@mycompany.pagerduty.com. Now you simply go to all your various alerting systems and point them at the service email address rather than you.
All this is layered up with multiple escalation lists and on-call rotation schedules which automatically change the primary and escalated contacts for one or more services. The great beauty is that if someone goes on vacation I make a change in one place rather than 10.
In my case I only send “wake me up” grade alerts to PagerDuty, everything else goes to internal mailing lists, Jabber, etc.
Additionally, if your into the DevOps idea of putting your developers on-call, PagerDuty is a great way to facilitate that.
The service is a little pricy imho, but well worth the money. They provide a free 30 Day eval period during which you can send all the SMS’s and calls you wish at no charge, so you have absolutely no reason not to at least give it a spin around the block.