Sending Email with Attachments from the Command Line

January 19th, 2012

I have lots of awesome CLI based reporting tools. One was so awesome that other people in the company wanted to get it on a regular basis but they preferred to see it as CSV so it could be manipulated in Numbers or Excel. Modifying my report to output CSV was easy, I just added a conditional that replace my pretty column formated printf() with an ugly comma separated printf(). Sending CSV in email is easy, just pump it into ”sendmail -t”.

I quickly realized that using sendmail “as usual” sucked, because the CSV was in the body of the message, not an attachment. The solution was to send a Multi-Part MIME message. Doing so is easier than you think.

Lets look at a template example, piece by piece:

From: $FROM
To: $TO
Date: $DATE
Subject: $SUBJECT
Mime-Version: 1.0
Content-Type: Multipart/Mixed; boundary="ATTACHMENT-BOUNDRY"
Return-Receipt-To: $FROM

Some body stuff here, this is your message

Notice above that From, To, Date, is all pretty standard stuff. What is special is that we specify the MIME Version (1.0) and then set the content-type to “multipart/mixed”. Following that is a boundary string. A boundary string is an arbitrary string that represents the different parts of your message. In our case, it will separate the body from the attachments, but it can also be used for providing both HTML and Plain Text versions of a message in a single mail.

--ATTACHMENT-BOUNDRY
Content-Disposition: attachment;
filename="$FILENAME1"
Content-type: text/plain;
charset=US-ASCII;
name="$FILENAME1"
Content-Transfer-Encoding: quoted-printable

$ATTECHMENT_DATA1

The next section of of our message is noted by the boundary string prefixed by two dashes (--). Note that they are before but not after the boundary string! Next is the metadata about this portion of the message, namely the Content-type, encoding, and disposition.

It is important to note that Mail.app (OS X) is more strict about attachments than Thunderbird or Gmail. If you do not include a content-disposition it will register the section as just another part of the body. Mail.app requires that you be very careful about syntax, whereas Thunderbird and Gmail have a "I know what you meant" attitude.

--ATTACHMENT-BOUNDRY
Content-Disposition: attachment;
        filename="$FILENAME2"
Content-type: text/plain;
        charset=US-ASCII;
        name="$FILENAME2"
Content-Transfer-Encoding: quoted-printable

$ATTECHMENT_DATA2

--ATTACHMENT-BOUNDRY--

Here we have a second attachment. We could add as many as we wish, but notice that it ends with our boundary string again but now its surrounded by dashes front and back. This signifies the end our parts.

Thats really about it, pump all this into "sendmail -t" (ie: cat mymail.txt | sendmail -t, or equivalent) and away your mail goes.

One word about attachment type. Above the content type of the attachments was "quoted-printable". That or 8bit are fine for normal text such as CSV, but if you wish to send binary data you will want to base64 encode it (see BASE64(1) for syntax) and set the content-type as "base64".

LISA Keynote 2011: The DevOps Transformation

December 16th, 2011

Last week I was given the incredible opportunity to not only speak at LISA but to deliver the opening keynote.  I hadn’t expected to even go, but when I learned the topic was DevOps I made a last minute plea on the eve of the submission deadline for a slot to deliver a talk I was calling “The 60 Minute MBA”, a history of Operations Management.  My hope is that I could get some obscure timeslot so a handful of people could geek out with me on Operations Management and LEAN and how it is helping to fuel and direct a lot of the DevOps thinking out there.  To my great shock I was told I was given the keynote slot… frankly, something I didn’t want for fear of the stress associated with it, but Tom felt I should step up and that I’d do great.

I haven’t blogged much in the last year and when I have its on topics you probably wouldn’t expect from a “Solaris blogger”.  I’ve held back most of what I want to talk about and only let the cream rise to the top.  My already frantic reading backlog only intensified as I was trying to pack as much into my talk as possible and ensure I was accurate.  Everything I read, watched, attended or did was reshaping my talk and I essentially spent 6 months “on stage” in my mind.   The problem I really had was that I had maybe 6 hours of content that I needed to condense into a 1 hour slot, hitting the essentials but not diluting its potency.  And, of course, I’m still learning every day.  Only 2 weeks prior to my talk did I finally hammer out a rough slide deck and I then had to keep pushing it around into something moderately cohesive.  Trying to find ways to address wisdom, systems thinking, agile, lean, TPS, OM and OR, and tie all this back to DevOps was a challenge.

To make things more challenging, Tamarah’s (my wife, seen above) due date for our 5th child is the 14th of Dec and the talk was to happen at 9:30AM Eastern time, which is 6:30AM Pacific and I’m not a morning person.  So… all things considered, I did pretty well, but you will notice in my talk that I was a little slower than I normally would be.  The upshot, however, was that I didn’t ramble much which kept me on my time marks.

What was interesting to me was what different people walked away with. Some people really keyed in on the value chain and asking “Why?”. Others wanted to rediscover ITIL because it was the first time they had heard it didn’t suck. Others got interested in operations management and LEAN, something they’d heard of but didn’t know where to start learning more. Others keyed on the collaboration of devops and bringing teams together. There was, I think, something for everyone and I didn’t hear any negative feedback on that talk beyond some people liking some parts and not caring about others… and it was designed that way.

Two things I want to note for viewers. First, when I said “by men I mean the human race”, I should have better explained that I think of “men” in a JRR Tolkien sense, the “race of man”. Secondly, at the very end I bagged on Sun TechPubs… I didn’t really explain myself and someone took offense to it. The fault was not on Sun’s writers, but rather on the engineering managers who wouldn’t permit writers the access to engineering they needed, so TechPubs was left to figure it out themselves. The fault was squarely on the engineering managers, NOT on the writers. Given the circumstances they have always turned out amazing documentation and I have nothing negative to say about the writers (as I noted in my answer, I wanted to be one at one time).

Anyway.  The following is the keynote, the slides can be found here.


 

I referenced a lot of books, and may have asked for the list of books, so here it is.

Please note! I do not profit from any of this in any way, I’m not getting a book kick back or whatever.  My only source of income is my Joyent salary.

The Essential Books you should read to put DevOps, ITIL/ITSM, LEAN and Operations Management into perspective and educate yourself for the future:

  1. The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps
  2. Any Operations Management textbook
  3. Web Operations: Keeping the Data On Time
  4. Lean IT: Enabling and Sustaining Your Lean Transformation

The Advanced Books you can read to dig behind the ideas, this is my “Best Of” list:

  1. My Philosophy of Industry & Moving Forward Henry Ford
  2. Today and Tomorow Henry Ford
  3. The Principles of Scientific Management Taylor
  4. The Toyota Production System Ohno
  5. Out of Crisis Deming
  6. The New Economics Deming
  7. Management Challenges for the 21st Century Drucker
  8. The Goal Goldratt
  9. Critical Chain Goldratt
  10. Creating the Corporate Future Ackoff
  11. Future Shock Toffler

One book mentioned in my talk that I do not own, nor have I read, is Lean Startup by Eric Ries, which is based largely on The Four Steps to the Epiphany a book I did buy at the MIT Press bookstore after my keynote. “Lean Startup” is popular, but all he’s really doing is applying LEAN concepts and Agile methodologies to the startup. There are hundreds of “Lean XYZ” books. I am personally interested in the real deal, not books about other books. “LEAN IT” is my one exception because it can be a big time saver and I feel it gives proper credit to the history and sources of the ideas it espouses.

Finally, rather than give you a “fire hose” list of everything, I’ll simply include a picture of what I feel is a very complete libary on these various topics.  The handful of books missing from these shelves are PDFs on my iPad such as the official  “ITILv3 2011 Update”, several books on Engineering Systems, etc.  Click the image to see it high-res.

 

Using Graphite to Graph DTrace Metrics: Part II

November 14th, 2011

In a previous entry I described Graphite and gave an overly simplistic example of integrating it with DTrace… lets get a little more serious and see what fun we can have.

For a years a problem nagged at me.  I wanted to get really fine grained latency information from an NFS server to track user experience.  This isn’t an easy thing to do, especially for hundreds of exports.  First off, you have to use DTrace to get that kind of data, there isn’t really any other way to find per operation latency on a per export basis.  Secondly, writing all that data into local RRDs is a massive I/O problem in its own right.  Thirdly, graphing the data once its in RRD isn’t hard, but creating summary “rollup” graphs (ie: all instances of a metric in a single graph)  requires righting scripts that mush all the individual RRDs together, which of course is a pain in a dynamic environment.  And thats just for starters.  When you dig deeper into this problem you just find other, smaller, problems.

Many solutions were tried but only Graphite made the final cut.  In particular the fact that its network based with no agents, databases are dynamically created so if new instances come or go the system simply adapts and there is no administration required, and most importantly, Graphite creates graphs based on a simple “URL API”.   This all means that that we can dynamically add metrics to Graphite and just as easily we can dynamically graph them and that means we can get maximum power out of DTrace’s ultimate weapon: the aggregate!

So our goal is to create DTrace scripts which output aggregates that are then transported into Graphite. There are many ways to do this, but I really wanted something that could be controlled by a single script and managed via SMF. After several iterations I arrived at a solution using PERL that forks dtrace scripts which feed data via STDIN to a helper script to parse and transmit the data to Graphite. Lets look at each piece.

First, the control script. This simply forks the DTrace scripts and pipes STDOUT to STDIN of the helper script.

#!/usr/perl5/bin/perl
#
# Control script for Per-Export NFS Latency Graphite Metrics
# 

@SCRIPTS = ('read','write');

foreach $i (@SCRIPTS) {
        if (my $WORKER = fork()) {
                print("Forking of PID $WORKER for $i I/O\n");
                exec("./nfsv3-latency.d/nfsv3-${i}-latency.d |
                     ./nfsv3-latency.d/graphite-nfsv3-assist.pl ${i}");
        }
}

The DTrace scripts are very simply, we trace entry and return of rfs3_read (the server side function for processing NFSv3 reads) and load the export path and latency in ms into an aggregate. Every 10 seconds, we output and clear the aggregate.

#!/usr/sbin/dtrace -s

#pragma D option quiet

rfs3_read:entry
{
        self->time = timestamp;
        self->start = 1;
        self->export =  stringof(args[2]->exi_export.ex_path);
} 

rfs3_read:return
/self->start == 1/
{
        this->elapsed   = timestamp;
        this->ms        = (this->elapsed - self->time)/1000000;

        @read[self->export] = avg(this->ms);

        self->start = 0;
}

tick-10sec
{

        printa(@read);
        trunc(@read);
}

The write DTrace script is the same, just substituting in “write” instead of “read”.

Now for the helper script that parses the aggregates and transmits the data to Graphite. Here we create a TCP session to the Graphite server, parse the STDIN into its 2 components, which in this case are export path and latency, then do some sanity checking to make sure data looks correct and finally send the key/value pair to the Graphite server:

#!/usr/perl5/bin/perl
#
# GraphiteAssist v0.1
# 
#
# The primary purpose is to provide a way
# for DTrace Aggregates to be injected into Graphite
#

use IO::Socket;

## Default Values:
my $GRAPHITE_SERVER = "graphite.server.com";
my $GRAPHITE_PORT   = 2003;

if ( ! $ARGV[0] ) {
        die("USAGE: $0 \n");
}
my $METRIC = $ARGV[0];

my $HOSTNAME = `hostname`;
chomp($HOSTNAME);

## Prep the socket
my $sock = IO::Socket::INET->new(
    Proto    => 'tcp',
    PeerPort => $GRAPHITE_PORT,
    PeerAddr => $GRAPHITE_SERVER,
) or die "Could not create socket: $!\n";

while() {
  chomp($_);
  $_ =~ s/^\s+//; # Trim any leading whitespace
  my ($EXPORT,$VALUE,$OTHER) = split(/\s+/, $_, 3);

  ### Sanity check on the input data
  if ($OTHER) {
       # print("I got some other crap here: $OTHER (Input: $_)\n");
        next;
  }
  if ($EXPORT !~ m/\w+/) {
       # print("Export looks wrong: $EXPORT (Input: $_)\n");
        next;
  }
  if ($VALUE !~ m/\d+/) {
       # print("Value looks wrong: $VALUE (Input: $_)\n");
        next;
  }

  my $KEY = "joyent.${HOSTNAME}.exports.${EXPORT}.latency_${METRIC}";

  $DATE = time();
  #print("Sending: $KEY $VALUE $DATE\n");
  $sock->send("$KEY $VALUE $DATE\n") or die "Send error: $!\n";

}

There you have it. We can take it a step further by controlling this via SMF, but I’ll leave that part as an exercise for the reader.

The scripts above are somewhat crude but they demonstrate the pattern here. You can use it to graph anything that DTrace can see, which is… everything. I’ve used this same pattern for monitoring VFS latency on a large scale, as well as MySQL query latency, and various types of throughput.

Its the Graphite URL API that really makes this powerful, because I can glob for keys. For instance, the following URL would render ALL export latency (read/write for each export) for the last 1 hour. (This is a single URL, but I’m breaking it a part a bit to make the various arguments passed to render clear.)


http://graphite.server.com/render/?

 width=800&height=600&
 target=joyent.nfs-server.exports.*.*.*.latency_*&
 tz=utc&
 from=-1hours

DTrace is a fabulous means of obtaining hard to get data, and Graphite is a fabulous means of graphing hard to graph data… combined they can accomplish almost anything.

CEO of NeXT Computer Dies

October 6th, 2011

Steve Jobs was many things to many people.  He liberated the innovations of XEROX PARC and brought them to the masses through yet more innovation.  He brought style together with technology and forged an unending bond between the two.  Apple has revolutionized the world multiple times over 3 decades under his leadership, and proved that it was his leadership that made the difference when the company almost failed after his leaving.   On and on….

… but for me, I will also remember him most of all, for NeXT.

Rest in peace Steve.  We’ll miss you.  Thanks for leaving the world with a dominant OS that is UNIX based on laptops that “just work”.

Solaris Family Reunion: TOMORROW!

October 3rd, 2011

Sorry for the late notice, but all you folks out here in the Bay Area for OracleWorld won’t want to miss out on a very exciting event tomorrow night:

  • What? Solaris Family Reuinion
  • Where? Joyent HQ, 345 California St, 20th Floor
  • When? Tuesday Oct 4th, 6PM till 10PM (and maybe a pub after that!)
  • Why? Beer! Food! Community!
  • Register here: http://smartos-estw.eventbrite.com

We’ve all gone off in different directions, but this will be an amazing and rare opportunity to get the band back together, share stories and talk about the future and just have a good time as a Solaris community.  You will not want to miss it!

Password Myths

August 10th, 2011

XKCD always has something interesting and funny to say.  This one made me think a bit:

We all know longer is better than more funky, but we rarely do it in practice.  I’ve seen plenty of passwords in my time and they are almost always 6-8 chars. Why?  Least common denominator of course, the truth is that most people (even IT people) re-use the same password over and over, so they pick on that works with everything, meaning 8 chars long with an alphanumeric mix.

I remember the first time I used a program that supported and encouraged long passwords… it was PGP, which called them pass phrases.   Frankly, I wish all use of the word “password” was replaced with “pass phrase” as it instantly changes your perception into something more useful.

Most UNIX systems now use SHA or MD5 has the default scheme, which allows up to 255 chars for your password.  So that’s not a limitation anymore.  But what about most web sites?   I thought I’d use the model XKCD offers as a test.  I created a pass phrase that is simply my 4 favorite things, in order, with spaces in between and the first char of each word capitalized.  No digits, no punctuation.  The 4 words plus spaces comes out to 29 chars.  Then I changed my password on some popular sites to see if it would work.  Here are the results:

  • Facebook: Works
  • Google (Gmail/Youtube): Works
  • Twitter: Works, but spaces are not allowed.
  • Yahoo (Yahoo Mail): Works (See below)
  • Reddit: Works
  • Digg: Works

Funny thing happened when I changed my Yahoo password, it switched my language preference to Vietnamesse for some reason.  And, to make it all the more bizarre, there is no obvious place to change my language preference back.  I guess I’ll have to use Google Translate to fix my Yahoo account.

So, go ahead, change your password to something easier to remember and more secure, and let go of your old standby.

PS: If your managing systems… for heavens sake, turn on account locking and consider using Duo.

Happy SysAdmin Day

July 29th, 2011

Its here again, your favorite day of the year: SA Day.

Several fun things to check out today, Matt has a good list of them in his blog today.

From Cuddletech, have a good day watching YouTube videos and feeling good about yourself for once, put down the ticket queue and plan your next vacation, and drink some really good beer tonight.

Here’s a picture of Nova (my eldest daughter) on her first day as an SA, earlier this year, doing her first ever drive swap in the data center:

Nothing New Under the Sun: An Introduction to Operations Management (OM)

July 21st, 2011

8 All things are full of weariness;
a man cannot utter it;
the eye is not satisfied with seeing,
nor the ear filled with hearing.
9 What has been is what will be,
and what has been done is what will be done,
and there is nothing new under the sun.
10 Is there a thing of which it is said,
“See, this is new”?
It has been already
in the ages before us.
11 There is no remembrance of former things,
nor will there be any remembrance
of later things  yet to be
among those who come after.

 

Ever been irritated by the subtle but constant reference by Agile and DevOps people to manufacturing?  You may not even realize they are doing it, but you’ll hear reference to a book called “The Goal”, quotes from Deming, analogies to factories, etc.  In many conference talks I could feel that there was some larger body of knowledge that speakers were alluding to, but not fully describing.  What was this secret knowledge?  Last year I finally stumbled upon the answer and I’ve been consumed by it ever since… long time readers of my blog will note a considerable change in tone and subject since Dec of last year.

This secret body of knowledge that is all around you, but not directly named is “Operations Management” (OM).

Classically, it is said that a company is made up of 3 primary organizations divisions: Finance, Marketing (which includes Sales), and Operations.  Finance handles the books and internal resources, Marketing brings the market to the company and sells its products to that market, and Operations is the part of the company that does what your company does.  This is an overly simplistic model, but it makes a complex organization easier to grok.  If you run a hot dog stand, “operations” refers to ordering hot dog stuff, making hot dogs, serving customers, etc.  If you make cars, “operations” refers to the factory floor managing supply chain, operating the assembly line, and delivering cars to dealers.  If you run a web site, “operations” refers to the developers and sysadmins who make the product, run it, etc.  So again, the model breaks down to bean counters, sellers, and makers/doers.

Have you ever thought about getting an MBA?  I have.  Except, when I looked at the curriculum my eyes somehow danced right over OM, because I didn’t know what I was looking for.  Now I know.  You can examine the OM departments at Harvard Business School and MIT Sloan.  As with so many things today, the first step to knowledge is knowing what to look for, if you don’t know what its called you can search until your blue in the face and find nothing of real value.

My journey really took off when I found, at Church of all places, a donated text book entitled Fundamentals of Operations Management (4e).  “WOW!” I though, “that what I’ve been looking for!”  One look at the table of contents and I knew I’d stumbled onto the illusive body of knowledge I’d sought for so long:

  1. Introduction to Operations Management
  2. Operations Strategy: Defining How Firms Compete
  3. New Product and Service Development, and Process Selection
  4. Project Management
  5. The Role of Technology in Operations
  6. Process Measurement and Analysis
  7. Financial Analysis in Operations Management
  8. Quality Management
  9. Quality Control Tools for Improving Processes
  10. Facility Decisions: Location and Capacity
  11. Facility Decisions: Layouts
  12. Forecasting
  13. Human Resource Issues in Operations Management
  14. Work Performance: Measurement
  15. Waiting Line Management
  16. Waiting Line Theory
  17. Scheduling
  18. Supply Chain Management
  19. Just-in-Time Systems
  20. Aggregate Planning
  21. Inventory Systems for Independent Demand
  22. Inventory Systems for Dependent Demand

Jack pot!  If more than half of those chapters don’t seem pertinent to IT departments, then you’ve never tried to manage one.  The focus may be slightly different, but the core issues, problem domains, and related disciples are essentially identical.  This explains why so many “experts” are making reference to OM, knowingly or unknowingly, because in manufacturing they dealt with the same problems, in essence, we have in IT.  The Web companies (Twitter, Facebook, Flikr/Etsy, etc) are the ones leading the charge because more than traditional IT organizations, they really do look like the factory floor producing a single line of products.

So now… now I know what questions to ask.  And ask I did.  This opened up a whole new world to me that was right under my nose.  The Toyota Production System (TPS) which became known in the US as “Lean”… W. Edwards Deming and Total Quality Management (TQM)… ISO-9001…. the undertones of ITIL, CobiT, ISO-27001, and Agile…. it all came together and made sense for the first time.

This sent me into an epic journey as I sought out book after book after book by the cornerstone individuals of OM, because they all wrote books that formed the modern body of knowledge.  I now own all of Henry Ford’s books, Shigeo Shingo’s books, Taiichi Ohno’s books, W. Edward Deming’s Books, Walter Shewhart’s book, Fredrick Winslow Taylor’s book, Ludwig von Bertalanffy’s books, Peter Drucker’s books, and on and on and on.  I couldn’t stop buying and reading these texts that describe the world we find ourselves in today, shaped by the work they did so long ago.  All these points in my head started to be connected, one by one, and a fabric of knowledge appeared.

Friends, the point is this: there is nothing new under the sun.  Things change, evolve, and morph, sure, but the principles are not new.  If they were, we wouldn’t look back at Plato and Aristotle as wise today, much of what they debated 2400 years ago is still as pertinent today.  So it is with Agile and DevOps, the core principles have been well explored and addressed in the last century of manufacturing as part of Operations Management.  We only need adapt that knowledge, and the “experts” are doing exactly that.

Consider an example.  As a consequence of the innovations Ohno was introducing at Toyota in building the Toyota Production Systems (TPS, aka Lean), and in particular that of Kanban (the basis of Just-in-Time production, which is pull rather than push based production), he needed a way to speed up the “changeover time” (setup time) of large pressing machines.  These machines contain “die” which press sheet metal into, say, a car door.  The changeover time could be as much as 6 hours… that means, when you decide to stop making part A and want to make part B, you have to shut down for 6 hours to setup the machine for the new part before starting production again.  The way this was typically handled was to simply make a shitload of parts to build up a big inventory so that you reduced the likelyhood of needing to do another setup.  They were after local efficiency (what the “Theory of Constraints” calls local optima) at all costs.  This mass production method wasn’t going to work in Ohno’s new just-in-time world, the idea of stamping out only 20 parts and then changing to create another was completely idiotic.  At least, it was until he put Shigeo Shingo on the job.  It too Shingo years to make it happen, but ultimately he created a method know as “Single Minute Exchange of Dies” (SMED).  With his method you can change dies in less than 10 minutes (single-digit minutes, not 60 seconds).  This was the breakthrough that Ohno needed to make Kanban really work… and work it did.  With out SMED, a technology approach, to compliment Ohno’s other methods (Kanban, 5S, 5W, Andon, Muda, etc) Toyota just wouldn’t have been the industrial revolutionary that they became.

Now, why the hell am I telling you all that?  Look at what cloud did to IT.  Just like Kanban, Cloud came along and showed us that our setup times are way too long, and changeover from one type of setup to another was awful.  Configuration Management (CFengine, Chef, Puppet, etc) are the SMED of our industry.  Same problems, same needs, different solutions, but similar approaches.  There is no reason for us to re-invent all the wheels, alot of these issues are solved problems, if you just know where to look and what questions to ask, and have an open mind.

If you are like me and have been looking for something, but you know not what, go find yourself a book on Operations Management and get your journey started.  You’ll have a massive head start over all your peers who won’t figure this out for another couple years (just as others already got a head start over us).

Three Aspects of DevOps: What’s in a word

June 24th, 2011

Cloud.  DevOps.  Both are in the fad category, but both are very popular and everyone is grasping at what they really are.  There is a subtle difference however.  “Cloud” is ambiguous, this leads to the never ending line of questions “What is Cloud?”  and yet more as the concept evolves such as “If cloud means in the cloud, then isn’t private cloud an oximoron?”  DevOps on the other hand seems deceptively intuitive.  This has caused confusion in the ongoing conversation because different people mean different things.

I see it as 3 distinct definitions and I’m going to lay them out to help people start to refine their thinking.  To facilitate this I’m going to take dev and ops as keywords and add an operator.  The operator determines which methodology is adopted by the other camp.

Now let me go a step further and suggest that these are not simply aspects of devops, they are in fact the 3 phases of what is collectively the “devops transformation”.

Phase I: Dev > Ops

In this phase developement methodology and mentality are adopted by operations.  My estimate is that this represents about 90% of the devops movement.  This where the DevOps movement started and where most of its focus is today.  Several things happen in this phase:

  • IT groups and systems/network administrators re-realize themselves as “operations”.  Let us not forget that this is a new concept to many people, they don’t think of themselves as “operations” they think of themselves as IT.  If you’re running a website this is a fairly natural fit, but for traditional IT groups this is an area of contention in and of itself.  In the enterprise space you’re not “operating” a website, your “operating” a business.
  • Agile is slowly adopted and adapted into operations.  In many cases this means striping agile down to its first principles and its Lean roots.  Its a matter of taking existing practices such as ITIL and marrying them with agile principles.  This is slow and individually tailored to each company as many folks have found that things like SCRUM don’t work for ops, but visual workflow and control of work in progress (meaning, the inappropriately named “Kanban”) do.  Finding balance between Peter Drucker’s “doing things right and doing the right thing” takes time.
  • Re-tooling for the virtualized world.  I could say “cloud world” but thats inaccurate, since the problems are the same if you have a large internal VMware deployment or an external AWS deployment.  This is where most of the action has been in DevOps so far and what the DevOps Toolchain Project has been about.  This is the draw, in particular, to configuration management (in the automation sense, not the ITIL one) and is helped along by the 3 companies really driving the publicity of DevOps, those being Puppet Labs, OpsCode and DTO Solutions.
  • Monitoring gets kicked up a gear.  Just as virtualization causes you to re-evaluate your tools for configuration management and command-and-control (now being called “distributed orchestration”) your monitoring needs to step up to the new challenges as well.  This is where you will challenge your existing monitoring system, expand its functionality and re-consider your logging and trending strategies.  Maybe everything is up to snuff already, but with all the recent additions to the alerting/logging/trending category you’ll inevitably try some new things and get over your fear of using tools written in Ruby. :)
  • etc.

Phase II: Dev < Ops

In this phase operations methodology and mentality are adopted by developers.   My estimate is that this represents less than 10% of the devops movement.  This phase generally represents the bonding of the two groups and is easily confused with Phase I.  Things that happen in this phase are:

  • Metrics everywhere.  This is something championed by John Allspaw, the collection of metrics from everywhere.  In Phase I you may have started collecting metrics but they were by Ops for Ops, however in this phase Dev is actually interested in the metrics and they are more business focused, so metrics aren’t just coming from the OS but are also coming from application code.  This is where dashboards start to be created to facilitate the wide absorption of metrics.
  • Continuous Integration is implemented or evaluated.  Ops alone can’t implemented CI, so so inevitably cooperation forms around it.
  • Cross-training of tools and practices.  This is when developers take a genuine interest in day-to-day operations activities, challenges and start aligning the toolsets between both groups.
  • etc.

 

Phase III: Dev <> Ops

In this phase developers and operations unify, sharing responsibilities and practices.  I think this is an underlying principle of the movement, and frankly is what DevOps really is about.  This is the magical destination of your journey, a far country where Adam Jacobs rides unicorns and children spend time with their OpsDad’s and everyone sings drinking songs together at the pub.  The DevOps movement is so concentrated on Phases I and II that this is still an uncrystalized space, but it is what you are driving towards, therefore the things that emerge in this phase are:

  • Fully shared responsibility in a “no finger-pointing” environment.  Dev may built it, Ops may deploy and maintain it, but both parties are fully committed to success and personal responsibility doesn’t end after the code is committed or deployed or whatever.  This is where post-mortems involve both teams, this is where performance problems are solved in dev/op pairs working together.  When things come up, you don’t just re-assign the ticket, you get together and work it as a single unified team.
  • Developers are on-call.  This is popularized by WebOps shops and is not directly applicable to enterprises, but the principle still applies.  There should be an on-call rotation in development for problems which may be attributed to code they’ve written and full accept themselves as capable first-responders.
  • Integrated Continuous Improvement.  At this stage there aren’t 2 teams anymore, there is large interdisciplinary team of professionals.  New tools, new practices, new projects, etc are presented to the whole team whether coder or sysadmin, so that both groups can bring their capabilities to the table as we continually improve.  Just as in TOC, we do not want Inertia (ie: “the new status quo”) to slow us down by making us complacent.
  • etc.

 

Framing the Conversation

Are you in the midst of a “DevOps Transformation”?  More likely the vast majority of you are just modernizing your existing operations practices and tools.  Perhaps your usual weekly dev and ops meetings have simply been renamed “DevOps Weekly Meeting”.  You might think that Phase III is impossible in your environment but there is some interesting things in I and II.  Everyone is in a different place, but there is a natural and inevitable progression here.  The first step on that road is changing the culture, and the DevOps movement has caused that to happen.

From this model you can see why using “DevOps” as a title or job description is problematic.  Which of the 3 do you mean?  Do you mean a sysadmin in Phase I who is up on the new wave of operations tools and practices?  Do you mean a developer in Phase II who is using feature-flags and continuous integration?  Do you mean any IT worker who is culturally savvy to working together and serving the common business needs?  Who are you talking about?

The parts that make up what “DevOps” is are not new.  Whats new is the culture shift that is being accepted in places it wasn’t previously.  In years past employees that tried to “cross the lines” would often be beat down as being overly nosy or over-achievers or whatever.  DevOps may be a fad, just as cloud was, but its opened up a world of possibilities that were previously closed to us.  Once using AWS was unthinkable, now its almost expected.  Once trashing Tivoli for an Open Source solution was crazy, now its welcomed.  Once asking dev for metrics in code was laughed at, now its applauded.  So embrace this time in the history of our industry and seize the new opportunities by keeping that conversation alive.

 

Using Graphite to Graph DTrace Metrics

June 21st, 2011

If you haven’t heard of Graphite you are missing out on a serious operations power tool. Let me make a gross over simplification and slightly inaccurate assertion to get you in the ballpark of understanding what it is: it’s RRDtool reimplemented for the web.

Let me be more specific for those new to it. Graphite is really made up of 3 components. The first is “Carbon” which is a metrics collection daemon that collects data via a UDP socket, caches the data and then records it to disk. The second is “Whisper” which is a round robin database that permanently stores your metrics on disk that is used by Carbon. The third is a Django app which can generate graphs based on your metrics via a snazzy web UI or via a simple URL API. So it implements an RRD database like RRDtool and a means of graphing the data like RRDtool but its accessible via a browser and graphs dynamically, so unlike RRDtool it isn’t necessary to pre-render static graphs at some interval.

There are 3 reasons I really find it hard to ignore Graphite. Firstly, you do not need to pre-generate your databases, if you send it a metric it hasn’t gotten before it just creates the database based on a flexible schema configuration. Secondly, you can get your graphs essentially in real-time by just refreshing a URL, no pre-generation. Thirdly, you can send it metrics using something as simple as netcat. The result is an insanely flexible metrics graphing system with very little configuration required and no agents necessarily.

So let me demonstrate how we can use all this power together with DTrace in a sample script:

#!/bin/bash
# Example DTrace/Graphite Integration
# Ben Rockwood 

export HOSTNAME=`hostname`
export GRAPHITE_SERVER="10.0.0.22";

/usr/sbin/dtrace -n '

#pragma D option destructive
#pragma D option quiet

BEGIN
{
        mycounter = 0;
}

syscall::read:entry
{
        mycounter++;
}

tick-1sec
{
        /* system("echo \"DEBUG: Sending data to metric dtrace.$HOSTNAME.syscall.read.entry
                                    on server $GRAPHITE_SERVER\" "); */
        system("echo \"dtrace.$HOSTNAME.syscall.read.entry %d %d\" | nc $GRAPHITE_SERVER 2003 ",
                     mycounter, walltimestamp / 1000000000);
        mycounter = 0;
}
'

So what I’m doing here is running a DTrace script via BASH. I’m using BASH as a wrapper so that I can do setup such as get the hostname. The DTrace script itself is overly simplistic, we’re just counting read system calls and incrementing a counter. The “tick-1sec” probe will fire every second during which it will reset the counter and run a system command. System commands can be destructive, so you’ll notice that pragma is set.

The system command we’re executing simply echos the metric in Graphites format and pipes it to netcat (“nc”) which sends it to the Graphite server. The format is simple: “some.metric.name value epoch_time” My metric here will be dtrace.newton.syscall.read.entry. (Newton is my workstation.)

I start that running and then go to the following URL:


http://10.0.0.22:8888/render/?width=400&height=250&target=dtrace.newton.syscall.read.entry&from=-1hours

And this is what I see:

See how flexible it is? If I wanted to run this on 4 web servers I could fire up the script, unmodified, on all 4 servers and then simply modify the URL to change the hostname in the target from “newton” to “*” and it would graph all 4 together, without having to even log onto the Graphite server. This is why I love Graphite, its so flexible you can pretty much cram it in anywhere and get useful data in a pinch.

Word of warning: The script above is intentionally over simplistic. My point here is to illustrate the basic principles, nothing more.