Using Graphite to Graph DTrace Metrics

Posted on June 21, 2011

If you haven’t heard of Graphite you are missing out on a serious operations power tool. Let me make a gross over simplification and slightly inaccurate assertion to get you in the ballpark of understanding what it is: it’s RRDtool reimplemented for the web.

Let me be more specific for those new to it. Graphite is really made up of 3 components. The first is “Carbon” which is a metrics collection daemon that collects data via a UDP socket, caches the data and then records it to disk. The second is “Whisper” which is a round robin database that permanently stores your metrics on disk that is used by Carbon. The third is a Django app which can generate graphs based on your metrics via a snazzy web UI or via a simple URL API. So it implements an RRD database like RRDtool and a means of graphing the data like RRDtool but its accessible via a browser and graphs dynamically, so unlike RRDtool it isn’t necessary to pre-render static graphs at some interval.

There are 3 reasons I really find it hard to ignore Graphite. Firstly, you do not need to pre-generate your databases, if you send it a metric it hasn’t gotten before it just creates the database based on a flexible schema configuration. Secondly, you can get your graphs essentially in real-time by just refreshing a URL, no pre-generation. Thirdly, you can send it metrics using something as simple as netcat. The result is an insanely flexible metrics graphing system with very little configuration required and no agents necessarily.

So let me demonstrate how we can use all this power together with DTrace in a sample script:

#!/bin/bash
# Example DTrace/Graphite Integration
# Ben Rockwood 

export HOSTNAME=`hostname`
export GRAPHITE_SERVER="10.0.0.22";

/usr/sbin/dtrace -n '

#pragma D option destructive
#pragma D option quiet

BEGIN
{
        mycounter = 0;
}

syscall::read:entry
{
        mycounter++;
}

tick-1sec
{
        /* system("echo \"DEBUG: Sending data to metric dtrace.$HOSTNAME.syscall.read.entry 
                                    on server $GRAPHITE_SERVER\" "); */
        system("echo \"dtrace.$HOSTNAME.syscall.read.entry %d %d\" | nc $GRAPHITE_SERVER 2003 ", 
                     mycounter, walltimestamp / 1000000000);
        mycounter = 0;
}
'

So what I’m doing here is running a DTrace script via BASH. I’m using BASH as a wrapper so that I can do setup such as get the hostname. The DTrace script itself is overly simplistic, we’re just counting read system calls and incrementing a counter. The “tick-1sec” probe will fire every second during which it will reset the counter and run a system command. System commands can be destructive, so you’ll notice that pragma is set.

The system command we’re executing simply echos the metric in Graphites format and pipes it to netcat (“nc”) which sends it to the Graphite server. The format is simple: “some.metric.name value epoch_time” My metric here will be dtrace.newton.syscall.read.entry. (Newton is my workstation.)

I start that running and then go to the following URL:

http://10.0.0.22:8888/render/?width=400&height=250&target=dtrace.newton.syscall.read.entry&from=-1hours

And this is what I see:

See how flexible it is? If I wanted to run this on 4 web servers I could fire up the script, unmodified, on all 4 servers and then simply modify the URL to change the hostname in the target from “newton” to “*” and it would graph all 4 together, without having to even log onto the Graphite server. This is why I love Graphite, its so flexible you can pretty much cram it in anywhere and get useful data in a pinch.

Word of warning: The script above is intentionally over simplistic. My point here is to illustrate the basic principles, nothing more.