Using Graphite to Graph DTrace Metrics: Part II

Posted on November 14, 2011

In a previous entry I described Graphite and gave an overly simplistic example of integrating it with DTrace… lets get a little more serious and see what fun we can have.

For a years a problem nagged at me.  I wanted to get really fine grained latency information from an NFS server to track user experience.  This isn’t an easy thing to do, especially for hundreds of exports.  First off, you have to use DTrace to get that kind of data, there isn’t really any other way to find per operation latency on a per export basis.  Secondly, writing all that data into local RRDs is a massive I/O problem in its own right.  Thirdly, graphing the data once its in RRD isn’t hard, but creating summary “rollup” graphs (ie: all instances of a metric in a single graph)  requires righting scripts that mush all the individual RRDs together, which of course is a pain in a dynamic environment.  And thats just for starters.  When you dig deeper into this problem you just find other, smaller, problems.

Many solutions were tried but only Graphite made the final cut.  In particular the fact that its network based with no agents, databases are dynamically created so if new instances come or go the system simply adapts and there is no administration required, and most importantly, Graphite creates graphs based on a simple “URL API”.   This all means that that we can dynamically add metrics to Graphite and just as easily we can dynamically graph them and that means we can get maximum power out of DTrace’s ultimate weapon: the aggregate!

So our goal is to create DTrace scripts which output aggregates that are then transported into Graphite. There are many ways to do this, but I really wanted something that could be controlled by a single script and managed via SMF. After several iterations I arrived at a solution using PERL that forks dtrace scripts which feed data via STDIN to a helper script to parse and transmit the data to Graphite. Lets look at each piece.

First, the control script. This simply forks the DTrace scripts and pipes STDOUT to STDIN of the helper script.

#!/usr/perl5/bin/perl
#
# Control script for Per-Export NFS Latency Graphite Metrics
# 

@SCRIPTS = ('read','write');

foreach $i (@SCRIPTS) {
        if (my $WORKER = fork()) {
                print("Forking of PID $WORKER for $i I/O\n");                   
                exec("./nfsv3-latency.d/nfsv3-${i}-latency.d |
                     ./nfsv3-latency.d/graphite-nfsv3-assist.pl ${i}");
        }
}

The DTrace scripts are very simply, we trace entry and return of rfs3_read (the server side function for processing NFSv3 reads) and load the export path and latency in ms into an aggregate. Every 10 seconds, we output and clear the aggregate.

#!/usr/sbin/dtrace -s

#pragma D option quiet

rfs3_read:entry
{ 
        self->time = timestamp; 
        self->start = 1; 
        self->export =  stringof(args[2]->exi_export.ex_path);
} 

rfs3_read:return
/self->start == 1/
{ 
        this->elapsed   = timestamp;
        this->ms        = (this->elapsed - self->time)/1000000;

        @read[self->export] = avg(this->ms);
        
        self->start = 0;
}

tick-10sec
{
        
        printa(@read);
        trunc(@read);
}

The write DTrace script is the same, just substituting in “write” instead of “read”.

Now for the helper script that parses the aggregates and transmits the data to Graphite. Here we create a TCP session to the Graphite server, parse the STDIN into its 2 components, which in this case are export path and latency, then do some sanity checking to make sure data looks correct and finally send the key/value pair to the Graphite server:

#!/usr/perl5/bin/perl
#
# GraphiteAssist v0.1
#  
#
# The primary purpose is to provide a way
# for DTrace Aggregates to be injected into Graphite
#

use IO::Socket;

## Default Values:
my $GRAPHITE_SERVER = "graphite.server.com";
my $GRAPHITE_PORT   = 2003;

if ( ! $ARGV[0] ) {
        die("USAGE: $0 \n");
}
my $METRIC = $ARGV[0];

my $HOSTNAME = `hostname`;
chomp($HOSTNAME);

## Prep the socket
my $sock = IO::Socket::INET->new(
    Proto    => 'tcp',
    PeerPort => $GRAPHITE_PORT,
    PeerAddr => $GRAPHITE_SERVER,
) or die "Could not create socket: $!\n";


while() {
  chomp($_);
  $_ =~ s/^\s+//; # Trim any leading whitespace
  my ($EXPORT,$VALUE,$OTHER) = split(/\s+/, $_, 3);


  ### Sanity check on the input data
  if ($OTHER) {
       # print("I got some other crap here: $OTHER (Input: $_)\n");
        next;
  }
  if ($EXPORT !~ m/\w+/) {
       # print("Export looks wrong: $EXPORT (Input: $_)\n");
        next;
  }
  if ($VALUE !~ m/\d+/) {
       # print("Value looks wrong: $VALUE (Input: $_)\n");
        next;
  }

  my $KEY = "joyent.${HOSTNAME}.exports.${EXPORT}.latency_${METRIC}";

  $DATE = time();
  #print("Sending: $KEY $VALUE $DATE\n");
  $sock->send("$KEY $VALUE $DATE\n") or die "Send error: $!\n";

}

There you have it. We can take it a step further by controlling this via SMF, but I’ll leave that part as an exercise for the reader.

The scripts above are somewhat crude but they demonstrate the pattern here. You can use it to graph anything that DTrace can see, which is… everything. I’ve used this same pattern for monitoring VFS latency on a large scale, as well as MySQL query latency, and various types of throughput.

Its the Graphite URL API that really makes this powerful, because I can glob for keys. For instance, the following URL would render ALL export latency (read/write for each export) for the last 1 hour. (This is a single URL, but I’m breaking it a part a bit to make the various arguments passed to render clear.)

http://graphite.server.com/render/?
 width=800&height=600&
 target=joyent.nfs-server.exports.*.*.*.latency_*&
 tz=utc&
 from=-1hours

DTrace is a fabulous means of obtaining hard to get data, and Graphite is a fabulous means of graphing hard to graph data… combined they can accomplish almost anything.