Cuddletech | Taking DTrace to the Next Level

Taking DTrace to the Next Level

Posted on March 14, 2009

My relationship with DTrace has been a rocky and strange sort of round trip. Like most folks, when it first arrived I built up a stash of one-liners, generally to expand upon output from vmstat and the like, such as to show execname for each page out or something. With Brendan Gregg’s DTrace Toolkit I found several fun well written script and used them where appropriate. But frankly, serious DTrace use seemed to me to be akin to mdb use… most people should only bother scratching the surface, only kernel engineers need descend deeper. And, I’m not a kernel engineer.

All that changed when I needed answers to questions about emerging Solaris technologies (SMF, ZFS, COMSTAR, etc.) that were too new to get help from others. I’m not a kernel engineer… but DTrace can make me look like one by pairing up the source code with DTrace stack aggregations. A whole new world opened up and DTrace and I have been best friends since.

I’ve now moved out of the exploratory era and into the sustained monitoring era with DTrace. Not only do I want to run DTrace scripts interactively, not only do I want to run them periodically via some management agent, rather I want to run persistent traces.

Starting a DTrace session that just runs forever presents two initial problems:

Is the tracing going to suck up resources or let buffers grow incrementally over days or weeks until the trace impacts the running environment?
How do you collect the data? Pipe to file?

So far, in my experience, the first concern is put to rest. I’ve got several traces running that I keep a regular watch on for any growth… and day after day they remain unchanged, sturdy as a rock. This makes me happy. Please note that I am not using aggregations or the like.

Collecting the data in some rational way seemed more difficult. Sure, I could pipe the output somewhere or use the dtrace -o some/file but that just doesn’t seem very flexible or professional to me. So I came up with a solution I like much better: Syslog!

Syslog is an amazing thing that is all too often under appreciated. Messages are sent to the daemon, which then routes them locally to a file or across the network to another syslog daemon which does the same. Combined with Splunk there isn’t much that Syslog can’t do. Syslog rules. To use syslog with DTrace there are two things to understand:

Firstly, using the Solaris logger command you can write messages to syslog from anywhere. This is super handy to embed in scripts, such as to signify that a backup is complete or some service is restarting or whatever. Here is an example:

root@quadra ~$ logger -p daemon.err 'Nightly Backup Failed!'
root@quadra ~$ dmesg | tail -1
Mar 14 01:56:08 quadra root: [ID 702911 daemon.error] Nightly Backup Failed!
root@quadra ~$ tail -1 /var/adm/messages
Mar 14 01:56:08 quadra root: [ID 702911 daemon.error] Nightly Backup Failed!

Now, DTrace has the beautiful ability to run System Commands from within a clause. The trick is that this is considered a “destructive action”, meaning it actually can modify the running state rather than simply observe it. When enabling a script to envoke destructive actions you should be very careful and trust no one, read the script before running! You can enable these actions on the command line with the -w option or by using the destructive pragma. The system() function in DTrace works much like printf() so variable substitution is easy.

So now lets look at a script that puts these two together. The script here enabled destructive actions so that we can use the system() function and calls out to logger to syslog output:

#!/usr/sbin/dtrace -s

/* Syslogging DTrace Script;  benr. */

#pragma D option quiet
#pragma D option destructive

fbt::*_fsync:entry
/ execname != "logger" /
{
        self->caller = execname;
        self->start  = timestamp;
}
  
fbt::*_fsync:return
/self->start/
{
        this->end      = timestamp;
        this->duration = (this->end - self->start)/1000000;

        system("/bin/logger -p local1.info %s by %s in %d ms", probefunc, self->caller, this->duration);

        self->start = 0;
}

If we then setup syslog by adding some the following lines to /etc/syslog.conf and restarting syslog…

## Dtrace:
local1.info                                     /var/log/dtrace
"/etc/syslog.conf" 62 lines, 1902 characters 

root@quadra ~$ touch /var/log/dtrace
root@quadra ~$ svcadm restart system-log

Now we start the script and tail that new logfile:

root@quadra ~$ tail -f /var/log/dtrace 
Mar 14 02:20:35 quadra root: [ID 702911 local1.info] fop_fsync by ps in 0 ms
Mar 14 02:20:35 quadra root: [ID 702911 local1.info] fop_fsync by auditd in 0 ms
Mar 14 02:21:08 quadra last message repeated 7805 times
Mar 14 02:21:08 quadra root: [ID 702911 local1.info] fop_fsync by nscd in 0 ms
Mar 14 02:21:08 quadra last message repeated 3 times
Mar 14 02:21:08 quadra root: [ID 702911 local1.info] fop_fsync by auditd in 0 ms
Mar 14 02:21:17 quadra last message repeated 2282 times
Mar 14 02:21:17 quadra root: [ID 702911 local1.info] fop_fsync by nscd in 0 ms
Mar 14 02:21:17 quadra last message repeated 3 times
Mar 14 02:21:18 quadra root: [ID 702911 local1.info] fop_fsync by auditd in 0 ms

If you need a refresher in syslog, please refer to my older post: Centralizing Syslog on Solaris.

Please understand! This is ONLY advisable for very low frequency probes! Running a system command from the script is not a lightweight thing and probes can fire a hell of a lot faster than shell commands can be evoked. Even the above example was a bad one, once I realized that auditing is a little overzealous. NEVER EVER EVER do this in a production environment without thoroughly testing it in your test environment. Moreover, if you misuse the system() function, don’t blame DTrace!

Now, this isn’t the only way or even the best way to syslog dtrace. Some other options exist such as piping to logger or even using an SMF wrapper to do so, such as Sam Falkner wrote some time ago. I implemented this way for two reasons: 1) because it works, and 2) because it demonstrates a way to extend your DTrace scripts in new ways using system() commands. My hope is that this excites in your mind some new and interesting uses.