Archive for June, 2011

Three Aspects of DevOps: What’s in a word

Friday, June 24th, 2011

Cloud.  DevOps.  Both are in the fad category, but both are very popular and everyone is grasping at what they really are.  There is a subtle difference however.  “Cloud” is ambiguous, this leads to the never ending line of questions “What is Cloud?”  and yet more as the concept evolves such as “If cloud means in the cloud, then isn’t private cloud an oximoron?”  DevOps on the other hand seems deceptively intuitive.  This has caused confusion in the ongoing conversation because different people mean different things.

I see it as 3 distinct definitions and I’m going to lay them out to help people start to refine their thinking.  To facilitate this I’m going to take dev and ops as keywords and add an operator.  The operator determines which methodology is adopted by the other camp.

Now let me go a step further and suggest that these are not simply aspects of devops, they are in fact the 3 phases of what is collectively the “devops transformation”.

Phase I: Dev > Ops

In this phase developement methodology and mentality are adopted by operations.  My estimate is that this represents about 90% of the devops movement.  This where the DevOps movement started and where most of its focus is today.  Several things happen in this phase:

  • IT groups and systems/network administrators re-realize themselves as “operations”.  Let us not forget that this is a new concept to many people, they don’t think of themselves as “operations” they think of themselves as IT.  If you’re running a website this is a fairly natural fit, but for traditional IT groups this is an area of contention in and of itself.  In the enterprise space you’re not “operating” a website, your “operating” a business.
  • Agile is slowly adopted and adapted into operations.  In many cases this means striping agile down to its first principles and its Lean roots.  Its a matter of taking existing practices such as ITIL and marrying them with agile principles.  This is slow and individually tailored to each company as many folks have found that things like SCRUM don’t work for ops, but visual workflow and control of work in progress (meaning, the inappropriately named “Kanban”) do.  Finding balance between Peter Drucker’s “doing things right and doing the right thing” takes time.
  • Re-tooling for the virtualized world.  I could say “cloud world” but thats inaccurate, since the problems are the same if you have a large internal VMware deployment or an external AWS deployment.  This is where most of the action has been in DevOps so far and what the DevOps Toolchain Project has been about.  This is the draw, in particular, to configuration management (in the automation sense, not the ITIL one) and is helped along by the 3 companies really driving the publicity of DevOps, those being Puppet Labs, OpsCode and DTO Solutions.
  • Monitoring gets kicked up a gear.  Just as virtualization causes you to re-evaluate your tools for configuration management and command-and-control (now being called “distributed orchestration”) your monitoring needs to step up to the new challenges as well.  This is where you will challenge your existing monitoring system, expand its functionality and re-consider your logging and trending strategies.  Maybe everything is up to snuff already, but with all the recent additions to the alerting/logging/trending category you’ll inevitably try some new things and get over your fear of using tools written in Ruby. :)
  • etc.

Phase II: Dev < Ops

In this phase operations methodology and mentality are adopted by developers.   My estimate is that this represents less than 10% of the devops movement.  This phase generally represents the bonding of the two groups and is easily confused with Phase I.  Things that happen in this phase are:

  • Metrics everywhere.  This is something championed by John Allspaw, the collection of metrics from everywhere.  In Phase I you may have started collecting metrics but they were by Ops for Ops, however in this phase Dev is actually interested in the metrics and they are more business focused, so metrics aren’t just coming from the OS but are also coming from application code.  This is where dashboards start to be created to facilitate the wide absorption of metrics.
  • Continuous Integration is implemented or evaluated.  Ops alone can’t implemented CI, so so inevitably cooperation forms around it.
  • Cross-training of tools and practices.  This is when developers take a genuine interest in day-to-day operations activities, challenges and start aligning the toolsets between both groups.
  • etc.

 

Phase III: Dev <> Ops

In this phase developers and operations unify, sharing responsibilities and practices.  I think this is an underlying principle of the movement, and frankly is what DevOps really is about.  This is the magical destination of your journey, a far country where Adam Jacobs rides unicorns and children spend time with their OpsDad’s and everyone sings drinking songs together at the pub.  The DevOps movement is so concentrated on Phases I and II that this is still an uncrystalized space, but it is what you are driving towards, therefore the things that emerge in this phase are:

  • Fully shared responsibility in a “no finger-pointing” environment.  Dev may built it, Ops may deploy and maintain it, but both parties are fully committed to success and personal responsibility doesn’t end after the code is committed or deployed or whatever.  This is where post-mortems involve both teams, this is where performance problems are solved in dev/op pairs working together.  When things come up, you don’t just re-assign the ticket, you get together and work it as a single unified team.
  • Developers are on-call.  This is popularized by WebOps shops and is not directly applicable to enterprises, but the principle still applies.  There should be an on-call rotation in development for problems which may be attributed to code they’ve written and full accept themselves as capable first-responders.
  • Integrated Continuous Improvement.  At this stage there aren’t 2 teams anymore, there is large interdisciplinary team of professionals.  New tools, new practices, new projects, etc are presented to the whole team whether coder or sysadmin, so that both groups can bring their capabilities to the table as we continually improve.  Just as in TOC, we do not want Inertia (ie: “the new status quo”) to slow us down by making us complacent.
  • etc.

 

Framing the Conversation

Are you in the midst of a “DevOps Transformation”?  More likely the vast majority of you are just modernizing your existing operations practices and tools.  Perhaps your usual weekly dev and ops meetings have simply been renamed “DevOps Weekly Meeting”.  You might think that Phase III is impossible in your environment but there is some interesting things in I and II.  Everyone is in a different place, but there is a natural and inevitable progression here.  The first step on that road is changing the culture, and the DevOps movement has caused that to happen.

From this model you can see why using “DevOps” as a title or job description is problematic.  Which of the 3 do you mean?  Do you mean a sysadmin in Phase I who is up on the new wave of operations tools and practices?  Do you mean a developer in Phase II who is using feature-flags and continuous integration?  Do you mean any IT worker who is culturally savvy to working together and serving the common business needs?  Who are you talking about?

The parts that make up what “DevOps” is are not new.  Whats new is the culture shift that is being accepted in places it wasn’t previously.  In years past employees that tried to “cross the lines” would often be beat down as being overly nosy or over-achievers or whatever.  DevOps may be a fad, just as cloud was, but its opened up a world of possibilities that were previously closed to us.  Once using AWS was unthinkable, now its almost expected.  Once trashing Tivoli for an Open Source solution was crazy, now its welcomed.  Once asking dev for metrics in code was laughed at, now its applauded.  So embrace this time in the history of our industry and seize the new opportunities by keeping that conversation alive.

 

Using Graphite to Graph DTrace Metrics

Tuesday, June 21st, 2011

If you haven’t heard of Graphite you are missing out on a serious operations power tool. Let me make a gross over simplification and slightly inaccurate assertion to get you in the ballpark of understanding what it is: it’s RRDtool reimplemented for the web.

Let me be more specific for those new to it. Graphite is really made up of 3 components. The first is “Carbon” which is a metrics collection daemon that collects data via a UDP socket, caches the data and then records it to disk. The second is “Whisper” which is a round robin database that permanently stores your metrics on disk that is used by Carbon. The third is a Django app which can generate graphs based on your metrics via a snazzy web UI or via a simple URL API. So it implements an RRD database like RRDtool and a means of graphing the data like RRDtool but its accessible via a browser and graphs dynamically, so unlike RRDtool it isn’t necessary to pre-render static graphs at some interval.

There are 3 reasons I really find it hard to ignore Graphite. Firstly, you do not need to pre-generate your databases, if you send it a metric it hasn’t gotten before it just creates the database based on a flexible schema configuration. Secondly, you can get your graphs essentially in real-time by just refreshing a URL, no pre-generation. Thirdly, you can send it metrics using something as simple as netcat. The result is an insanely flexible metrics graphing system with very little configuration required and no agents necessarily.

So let me demonstrate how we can use all this power together with DTrace in a sample script:

#!/bin/bash
# Example DTrace/Graphite Integration
# Ben Rockwood 

export HOSTNAME=`hostname`
export GRAPHITE_SERVER="10.0.0.22";

/usr/sbin/dtrace -n '

#pragma D option destructive
#pragma D option quiet

BEGIN
{
        mycounter = 0;
}

syscall::read:entry
{
        mycounter++;
}

tick-1sec
{
        /* system("echo \"DEBUG: Sending data to metric dtrace.$HOSTNAME.syscall.read.entry
                                    on server $GRAPHITE_SERVER\" "); */
        system("echo \"dtrace.$HOSTNAME.syscall.read.entry %d %d\" | nc $GRAPHITE_SERVER 2003 ",
                     mycounter, walltimestamp / 1000000000);
        mycounter = 0;
}
'

So what I’m doing here is running a DTrace script via BASH. I’m using BASH as a wrapper so that I can do setup such as get the hostname. The DTrace script itself is overly simplistic, we’re just counting read system calls and incrementing a counter. The “tick-1sec” probe will fire every second during which it will reset the counter and run a system command. System commands can be destructive, so you’ll notice that pragma is set.

The system command we’re executing simply echos the metric in Graphites format and pipes it to netcat (“nc”) which sends it to the Graphite server. The format is simple: “some.metric.name value epoch_time” My metric here will be dtrace.newton.syscall.read.entry. (Newton is my workstation.)

I start that running and then go to the following URL:


http://10.0.0.22:8888/render/?width=400&height=250&target=dtrace.newton.syscall.read.entry&from=-1hours

And this is what I see:

See how flexible it is? If I wanted to run this on 4 web servers I could fire up the script, unmodified, on all 4 servers and then simply modify the URL to change the hostname in the target from “newton” to “*” and it would graph all 4 together, without having to even log onto the Graphite server. This is why I love Graphite, its so flexible you can pretty much cram it in anywhere and get useful data in a pinch.

Word of warning: The script above is intentionally over simplistic. My point here is to illustrate the basic principles, nothing more.

Consolidated Alerting Using PagerDuty

Sunday, June 19th, 2011

There are a lot of interesting SaaS offerings available today but not many that get me all excited.  I recently blogged about Duo Security, they get me excited.  Another of my favorites is Mint.  The most recent has been PagerDuty.  I met the PagerDuty team at Velocity & DevOps Days this week and they are a really awesome bunch of folks and so I thought I’d give them a little love to show my support.  (This isn’t sponsored, in all my years I’ve never made a dime from cuddletech and I never intend to.)

We all have several things in our infrastructures that alert.  You probably have multiple monitoring systems, there is lots of software with built in alerting capabilities, logging systems that alert, even external SaaS such as Pingdom, Circonus, New Relic, Keynote, whatever.  Managing all those contacts is a royal PITA.  If you have a NOC you can just have them all send alerts into a mail list which someone monitors and escalates as necessary, but if you have a small to midsize team its not realistic to have someone watching a list 24×7.

PagerDuty has a great many features but consolidation is by far the most exciting for me.  Within PagerDuty you add each member of your staff as a user and they themselves can add various contact methods which are escalated over time.  For instance, you may want an email immediately when an incident occurs, an SMS at +5 minutes, a phone call at +10 minutes, a phone call to your land line +15 minutes, etc.  Then you create services which represent different alerting systems.  So I have one for Zabbix, another for OpenNMS, another for Pingdom, etc.  Each service has its own email address: serviceX@mycompany.pagerduty.com.  Now you simply go to all your various alerting systems and point them at the service email address rather than you.

All this is layered up with multiple escalation lists and on-call rotation schedules which automatically change the primary and escalated contacts for one or more services.  The great beauty is that if someone goes on vacation I make a change in one place rather than 10.

In my case I only send “wake me up” grade alerts to PagerDuty, everything else goes to internal mailing lists, Jabber, etc.

Additionally, if your into the DevOps idea of putting your developers on-call, PagerDuty is a great way to facilitate that.

The service is a little pricy imho, but well worth the money.  They provide a free 30 Day eval period during which you can send all the SMS’s and calls you wish at no charge, so you have absolutely no reason not to at least give it a spin around the block.

Who You Are & Who You Wish To Be

Saturday, June 18th, 2011

The following is the video of Conan O’Brien delivering the 2011 Dartmouth College Commencement Address.  Watch it, if your time is short, skip to 19:40.

This is an incredibly insightful point that Conan makes: the difference between who you want to be and who you actually are is what truly makes you special and unique.

This profoundly resonates with me. I have many heros and I’m incredibly blessed to know many of them. I want to be that fine combination of dreamer and engineer that are Bryan Cantrill, Jeff Bonwick, and Carsten Haitzler. I want to be the operational manager that is John Allspaw. I want to have the contagious enthusiasm of Adam Jacob or the conviction and pragmatism of Theo Schlossnagle. I want the wisdom of Russel Ackoff, the eloquence and curiosity of James Burke, the theological intensity of Charles Haddon Spurgeon… I could go on and on. I have a great many role models.

But the thing is, I’m not any of those people and as hard as I try I never will be… and I have tried. Often our role models are in tension with one another. However, like our taste in music, which are similarly in conflict (I like Megadeth, and Stevie Wonder, and The Bird and the Bee, and Rachmaninoff, etc), those preference themselves say something unique about who we are.

I find great personal comfort in what Conan says because it matches up perfectly with what I’ve found in my lifes journey thus far but not been able to articulate. I share it with you because I know a great many other SysAdmin’s who similar have felt “If I can just be as good as that guy I will have made it!” Except, you never become that guy and when you at last reach that level your horizons have expanded and you realize that your course isn’t the same as his/hers.

What is profound is the point that it is the difference, not the similarity, that is truly important. That is what really makes you unique. That is the place from where you can truly contribute. Therefore, don’t ever hold yourself back or silence yourself until you reach some imaginary peak. Don’t be afraid to code just because you think your code is lousy, don’t be afraid to write just because you don’t think your as smart as someone else, etc, etc. One thing I’ve found about my heros that I’ve met, while I look at where they are as my destination, they themselves are on their own journey and can’t understand how anyone else would put so much value on their position because they themselves aren’t yet where they wish to be.

As Emerson put it: “Life is a journey, not a destination.” Whatever you have now is what someone else is wishing to aspire to, so never despair and never hold yourself back.

Duo Security: Two Factor Auth for the Masses

Thursday, June 9th, 2011

Smart Cards, OTP, Hardware Tokens like SecurID… 2 factor auth is an old standby and considered mandatory for any high security installation.  But lets face facts, there are a myriad of problems involved.  SecurID is complex and expensive and now has destroyed its credibility following the Lockheed break-in.  Smart Cards are really sweet, especially solutions from ActivIdentity, but again its expensive and you have client hardware requirements which can be a problem with many users.  OTP is nifty but most of the solutions out there are ancient and may not work with the platform your using.  But… that is the price of security right?  And what about all these new cloud deployments, traditional 2 factor solutions for your cloud?  Just shoot me.

Today I stumbled across Duo Security and was amazed.  It is an entirely modern 2 factor auth system that uses a SaaS model, open source client software and open APIs, integrates with just about anything, and uses the phone you already have in your pocket.

Whats amazing is that the guys a Duo have nailed the setup.  You go to Duo Security and sign up for an account, before you’ve registered they’ve already verified your phone via an automated voice call.  You finish the easy wizard and within 2 minutes your looking at their dashboard with a free account that supports up to 10 users.  For a UNIX system you download and compile their software (packages are available for Linux distros) which has a client program as well as a PAM module.  You add a new “Integration” (essentially an auth realm with its own API key) and feed the keys into the client configuration (which is only 3 lines long, btw) and run the client which gives you a URL to finish validating the host and your done.  10-15 minutes after first hitting their website you are up and running 2 factor security without a bit of pain.  Its so simple is just makes me smile… and how often does anything security related do that?

Duo supplies special variations of the service that are just as easy for Juniper, Cisco and Sonicwall VPN’s as well as a Web API… but I’m not going to address those here.

Once your UNIX host is setup, you have some options on how to employ it.  You can use PAM, which will make all users dual auth via Duo, or you can use a nifty per-user SSH trick by adding a command=”/usr/local/sbin/login_duo” to the beginning of your public key in the .ssh/authorized_keys file (which I didn’t even know was possible).  If you don’t have the ability to modify PAM this SSH hack is a great solution.

But whats really important is the experience of actually using it for auth.  Here is how it works for real using an SSH session.  When logging into your system and after accepting your password or key as usual, it stops the auth process and asks how to contact you:

Ben-Rockwoods-MacBook-Pro:~ benr$ ssh cuddletech.com
Password:
Duo login for benr

 1. Duo Push to XXX-XXX-1100
 2. Phone call to XXX-XXX-1100
 3. SMS passcodes to XXX-XXX-1100
Passcode or option (1-3): 1

Pushed a login request to your phone...

At this point the SSH is stuck.  Notice you have 3 choices: Duo Push (smartphone app), phone or SMS.  Duo Push is a free app for Android and iPhone which can accept push notifications.  When you do your setup part of the process will be installing this app if you wish, which only takes 2-3 minutes.  If you choose to use Duo Push, as I did, you’ll see something like this on your phone:

After accepting, your SSH session comes back to life:

Success. Logging you in...

Last login: Wed Jun  8 22:57:20 2011 from xxxxxx
                                __                       __
                       __      / /___  __  _____  ____  / /_
                    __/ /___  / / __ \/ / / / _ \/ __ \/ __/
                   /_  __/ /_/ / /_/ / /_/ /  __/ / / / /_
                    /_/  \____/\____/\__, /\___/_/ /_/\__/
                                    /____/
[cuddletech:~] benr$

It’s that easy!

Duo just got everything spot on, its easy, the documentation is clear and concise, its just beautiful.  The best part of it all is that its free for less than 10 users, which means that if you just have a single web server you wish to secure, you can!  Thanks to the SSH hack above you could even do it on a Shared Hosting account.  There is even a plugin for WordPress to use Duo for WP login.

To get started with it yourself, I recommend this post on the Duo blog: Announcing Duo’s two-factor authentication for Unix.  It walks you quickly through the whole process I described above.

In all fairness, I’ve only been using this for less than a day so I’m sure there are kinks I’ll run into and things to be improved, but it truly is amazing that I’ve got what feels like a solid solution working so quickly.  Auditing and logging gets a lot more interesting when you don’t have to second guess whether or not the user is in fact the user you think and this product opens up a lot of new possibilities and fills a much needed gap in the world of cloud security.

NOTE FOR OPENSOLARIS/ILLUMOS PAM USERS:

After you download and unpack duo_unix-1.6.tar.gz, run “./configure –enable-pam”.  Before you run “make” edit config.h and comment out the the line “#define HAVE_ASPRINTF 1″.  After that PAM will compile fine.  If you don’t, you’ll get “pam_extra.h:10: error: syntax error before “va_list”".  Also, make sure that you have an ‘sshd’ user for Duo to use.