Archive for the ‘Solaris’ Category

iPXE: Now with Native Menus and SmartOS Support

Monday, October 8th, 2012

If you’ve never heard of iPXE, it is the official fork of gPXE, which was the ultimate result of the Etherboot Project of old.  Apparently there was a power struggle that caused the primary contributors to leave Etherboot/gPXE and they renamed gPXE to iPXE to distinguish.  Technically gPXE still exists, but for all intents and purposes its a dead project.

If you are completely unfamiliar with both iPXE and gPXE let me summarize.  The industry standard way to network boot is via PXE.  A PXE client is burned into the ROM of your NIC, but because it has to fit in a tight space it is very dumb.  iPXE is an open source PXE client that is modern and very intelligent.  It can execute scripts, it can inspect the system interfaces and SMBIOS, it can download images and scripts via HTTP, FTP, NFS, and more, it has SAN support for booting off of AoE, FCoE, and iSCSI, etc.  It can be used in several ways, including burned into your NIC’s ROM as a replacement (uncommon), booted from USB/ISO/etc media, or most typically it is itself PXE booted such that your dumb PXE client in your NIC boots to iPXE and it then does all the heavy lifting.  If you are doing any type of network booting you should know what iPXE is and if you ever want to do anything fancy, iPXE is the way to do it.  One example many of us like to use is creating an iPXE script which calls out to a web app (PHP commonly) which looks up information from SMBIOS (such as serial number, service tag, MAC address, etc.) and interfaces with a database to make decisions on which image to boot.  You can do lots of fun things.  Most of your next-gen bare metal provisioning tools, such as Razor, rely on iPXE.

There are two really exciting things for me, just added in the last couple months.  The first and most basic is that SmartOS boots natively from iPXE.  In the past, primarily with OpenSolaris, you had to chainload PXEGRUB to boot Solaris, but it looks like some patches were accepted and now you can dump GRUB completely.

The other existing development is the addition of native menus in iPXE.  Historically, if you wanted to create a versatile netboot server you would use iPXE/gPXE to  chainload SYSLINUX’s menu.c32 program which would render your boot selection menu and boot your selected OS.  But no more!  iPXE can do it all on its own now thanks to the addition of 3 commands to iPXE: menu, item, and choose.  With these new commands and liberal use of “goto” labels you can create some extremely complex and powerful setups with no other helper programs in the way.

Lets take a look at a simple menu:

#!gpxe

######## MAIN MENU ###################
:start
menu Welcome to iPXE's Boot Menu
item
item smartos    Boot SmartOS
item
item shell      Enter iPXE shell
item reboot     Reboot
item
item exit       Exit (boot local disk)
choose --default smartos --timeout 60000 target && goto ${target}

## Utility menu items:
:shell
echo Type exit to get the back to the menu
shell
set menu-timeout 0
goto start

:failed
echo Booting failed, dropping to shell
goto shell

:reboot
reboot

:exit
exit

########## MENU ITEMS #######################
:sdc
kernel /sdc/20121001T165806Z/platform/i86pc/kernel/amd64/unix -B hostname=r720test,standalone=true
initrd /sdc/20121001T165806Z/platform/i86pc/amd64/boot_archive
boot

:smartos
kernel /smartos/20121004T212912Z/platform/i86pc/kernel/amd64/unix
initrd /smartos/20121004T212912Z/platform/i86pc/amd64/boot_archive
boot

You can see here that the “menu” command declares a menu with a title. The elements are items with a label and description (you can assign hot keys as well) and an item with no value is an empty line, and you can use the “–gap –” argument to create section headers, in the form “item –gap — —–SmartOS——-”. Finally, the choose command puts your selection into a named variable and also allows you to specify a default selection and timeout specified in milliseconds. Just about everything else is handled by the “goto” command and labels sprinkled throughout the script. Most importantly, we use the value obtained by the choose command to “goto” the label with the commands to boot the given OS. You can also have multiple menus, one which goes to the other and back, by being creative.

When you couple all this together, you get an iPXE that is more powerful than ever before and extremely exciting.

I’ve taken this opportunity to update the SmartOS Documentation for PXE booting,  using iPXE directly as above is now the officially recommended way to netboot.

Using Graphite to Graph DTrace Metrics: Part II

Monday, November 14th, 2011

In a previous entry I described Graphite and gave an overly simplistic example of integrating it with DTrace… lets get a little more serious and see what fun we can have.

For a years a problem nagged at me.  I wanted to get really fine grained latency information from an NFS server to track user experience.  This isn’t an easy thing to do, especially for hundreds of exports.  First off, you have to use DTrace to get that kind of data, there isn’t really any other way to find per operation latency on a per export basis.  Secondly, writing all that data into local RRDs is a massive I/O problem in its own right.  Thirdly, graphing the data once its in RRD isn’t hard, but creating summary “rollup” graphs (ie: all instances of a metric in a single graph)  requires righting scripts that mush all the individual RRDs together, which of course is a pain in a dynamic environment.  And thats just for starters.  When you dig deeper into this problem you just find other, smaller, problems.

Many solutions were tried but only Graphite made the final cut.  In particular the fact that its network based with no agents, databases are dynamically created so if new instances come or go the system simply adapts and there is no administration required, and most importantly, Graphite creates graphs based on a simple “URL API”.   This all means that that we can dynamically add metrics to Graphite and just as easily we can dynamically graph them and that means we can get maximum power out of DTrace’s ultimate weapon: the aggregate!

So our goal is to create DTrace scripts which output aggregates that are then transported into Graphite. There are many ways to do this, but I really wanted something that could be controlled by a single script and managed via SMF. After several iterations I arrived at a solution using PERL that forks dtrace scripts which feed data via STDIN to a helper script to parse and transmit the data to Graphite. Lets look at each piece.

First, the control script. This simply forks the DTrace scripts and pipes STDOUT to STDIN of the helper script.

#!/usr/perl5/bin/perl
#
# Control script for Per-Export NFS Latency Graphite Metrics
# 

@SCRIPTS = ('read','write');

foreach $i (@SCRIPTS) {
        if (my $WORKER = fork()) {
                print("Forking of PID $WORKER for $i I/O\n");
                exec("./nfsv3-latency.d/nfsv3-${i}-latency.d |
                     ./nfsv3-latency.d/graphite-nfsv3-assist.pl ${i}");
        }
}

The DTrace scripts are very simply, we trace entry and return of rfs3_read (the server side function for processing NFSv3 reads) and load the export path and latency in ms into an aggregate. Every 10 seconds, we output and clear the aggregate.

#!/usr/sbin/dtrace -s

#pragma D option quiet

rfs3_read:entry
{
        self->time = timestamp;
        self->start = 1;
        self->export =  stringof(args[2]->exi_export.ex_path);
} 

rfs3_read:return
/self->start == 1/
{
        this->elapsed   = timestamp;
        this->ms        = (this->elapsed - self->time)/1000000;

        @read[self->export] = avg(this->ms);

        self->start = 0;
}

tick-10sec
{

        printa(@read);
        trunc(@read);
}

The write DTrace script is the same, just substituting in “write” instead of “read”.

Now for the helper script that parses the aggregates and transmits the data to Graphite. Here we create a TCP session to the Graphite server, parse the STDIN into its 2 components, which in this case are export path and latency, then do some sanity checking to make sure data looks correct and finally send the key/value pair to the Graphite server:

#!/usr/perl5/bin/perl
#
# GraphiteAssist v0.1
# 
#
# The primary purpose is to provide a way
# for DTrace Aggregates to be injected into Graphite
#

use IO::Socket;

## Default Values:
my $GRAPHITE_SERVER = "graphite.server.com";
my $GRAPHITE_PORT   = 2003;

if ( ! $ARGV[0] ) {
        die("USAGE: $0 \n");
}
my $METRIC = $ARGV[0];

my $HOSTNAME = `hostname`;
chomp($HOSTNAME);

## Prep the socket
my $sock = IO::Socket::INET->new(
    Proto    => 'tcp',
    PeerPort => $GRAPHITE_PORT,
    PeerAddr => $GRAPHITE_SERVER,
) or die "Could not create socket: $!\n";

while() {
  chomp($_);
  $_ =~ s/^\s+//; # Trim any leading whitespace
  my ($EXPORT,$VALUE,$OTHER) = split(/\s+/, $_, 3);

  ### Sanity check on the input data
  if ($OTHER) {
       # print("I got some other crap here: $OTHER (Input: $_)\n");
        next;
  }
  if ($EXPORT !~ m/\w+/) {
       # print("Export looks wrong: $EXPORT (Input: $_)\n");
        next;
  }
  if ($VALUE !~ m/\d+/) {
       # print("Value looks wrong: $VALUE (Input: $_)\n");
        next;
  }

  my $KEY = "joyent.${HOSTNAME}.exports.${EXPORT}.latency_${METRIC}";

  $DATE = time();
  #print("Sending: $KEY $VALUE $DATE\n");
  $sock->send("$KEY $VALUE $DATE\n") or die "Send error: $!\n";

}

There you have it. We can take it a step further by controlling this via SMF, but I’ll leave that part as an exercise for the reader.

The scripts above are somewhat crude but they demonstrate the pattern here. You can use it to graph anything that DTrace can see, which is… everything. I’ve used this same pattern for monitoring VFS latency on a large scale, as well as MySQL query latency, and various types of throughput.

Its the Graphite URL API that really makes this powerful, because I can glob for keys. For instance, the following URL would render ALL export latency (read/write for each export) for the last 1 hour. (This is a single URL, but I’m breaking it a part a bit to make the various arguments passed to render clear.)


http://graphite.server.com/render/?

 width=800&height=600&
 target=joyent.nfs-server.exports.*.*.*.latency_*&
 tz=utc&
 from=-1hours

DTrace is a fabulous means of obtaining hard to get data, and Graphite is a fabulous means of graphing hard to graph data… combined they can accomplish almost anything.

Solaris Family Reunion: TOMORROW!

Monday, October 3rd, 2011

Sorry for the late notice, but all you folks out here in the Bay Area for OracleWorld won’t want to miss out on a very exciting event tomorrow night:

  • What? Solaris Family Reuinion
  • Where? Joyent HQ, 345 California St, 20th Floor
  • When? Tuesday Oct 4th, 6PM till 10PM (and maybe a pub after that!)
  • Why? Beer! Food! Community!
  • Register here: http://smartos-estw.eventbrite.com

We’ve all gone off in different directions, but this will be an amazing and rare opportunity to get the band back together, share stories and talk about the future and just have a good time as a Solaris community.  You will not want to miss it!

Using Graphite to Graph DTrace Metrics

Tuesday, June 21st, 2011

If you haven’t heard of Graphite you are missing out on a serious operations power tool. Let me make a gross over simplification and slightly inaccurate assertion to get you in the ballpark of understanding what it is: it’s RRDtool reimplemented for the web.

Let me be more specific for those new to it. Graphite is really made up of 3 components. The first is “Carbon” which is a metrics collection daemon that collects data via a UDP socket, caches the data and then records it to disk. The second is “Whisper” which is a round robin database that permanently stores your metrics on disk that is used by Carbon. The third is a Django app which can generate graphs based on your metrics via a snazzy web UI or via a simple URL API. So it implements an RRD database like RRDtool and a means of graphing the data like RRDtool but its accessible via a browser and graphs dynamically, so unlike RRDtool it isn’t necessary to pre-render static graphs at some interval.

There are 3 reasons I really find it hard to ignore Graphite. Firstly, you do not need to pre-generate your databases, if you send it a metric it hasn’t gotten before it just creates the database based on a flexible schema configuration. Secondly, you can get your graphs essentially in real-time by just refreshing a URL, no pre-generation. Thirdly, you can send it metrics using something as simple as netcat. The result is an insanely flexible metrics graphing system with very little configuration required and no agents necessarily.

So let me demonstrate how we can use all this power together with DTrace in a sample script:

#!/bin/bash
# Example DTrace/Graphite Integration
# Ben Rockwood 

export HOSTNAME=`hostname`
export GRAPHITE_SERVER="10.0.0.22";

/usr/sbin/dtrace -n '

#pragma D option destructive
#pragma D option quiet

BEGIN
{
        mycounter = 0;
}

syscall::read:entry
{
        mycounter++;
}

tick-1sec
{
        /* system("echo \"DEBUG: Sending data to metric dtrace.$HOSTNAME.syscall.read.entry
                                    on server $GRAPHITE_SERVER\" "); */
        system("echo \"dtrace.$HOSTNAME.syscall.read.entry %d %d\" | nc $GRAPHITE_SERVER 2003 ",
                     mycounter, walltimestamp / 1000000000);
        mycounter = 0;
}
'

So what I’m doing here is running a DTrace script via BASH. I’m using BASH as a wrapper so that I can do setup such as get the hostname. The DTrace script itself is overly simplistic, we’re just counting read system calls and incrementing a counter. The “tick-1sec” probe will fire every second during which it will reset the counter and run a system command. System commands can be destructive, so you’ll notice that pragma is set.

The system command we’re executing simply echos the metric in Graphites format and pipes it to netcat (“nc”) which sends it to the Graphite server. The format is simple: “some.metric.name value epoch_time” My metric here will be dtrace.newton.syscall.read.entry. (Newton is my workstation.)

I start that running and then go to the following URL:


http://10.0.0.22:8888/render/?width=400&height=250&target=dtrace.newton.syscall.read.entry&from=-1hours

And this is what I see:

See how flexible it is? If I wanted to run this on 4 web servers I could fire up the script, unmodified, on all 4 servers and then simply modify the URL to change the hostname in the target from “newton” to “*” and it would graph all 4 together, without having to even log onto the Graphite server. This is why I love Graphite, its so flexible you can pretty much cram it in anywhere and get useful data in a pinch.

Word of warning: The script above is intentionally over simplistic. My point here is to illustrate the basic principles, nothing more.

Duo Security: Two Factor Auth for the Masses

Thursday, June 9th, 2011

Smart Cards, OTP, Hardware Tokens like SecurID… 2 factor auth is an old standby and considered mandatory for any high security installation.  But lets face facts, there are a myriad of problems involved.  SecurID is complex and expensive and now has destroyed its credibility following the Lockheed break-in.  Smart Cards are really sweet, especially solutions from ActivIdentity, but again its expensive and you have client hardware requirements which can be a problem with many users.  OTP is nifty but most of the solutions out there are ancient and may not work with the platform your using.  But… that is the price of security right?  And what about all these new cloud deployments, traditional 2 factor solutions for your cloud?  Just shoot me.

Today I stumbled across Duo Security and was amazed.  It is an entirely modern 2 factor auth system that uses a SaaS model, open source client software and open APIs, integrates with just about anything, and uses the phone you already have in your pocket.

Whats amazing is that the guys a Duo have nailed the setup.  You go to Duo Security and sign up for an account, before you’ve registered they’ve already verified your phone via an automated voice call.  You finish the easy wizard and within 2 minutes your looking at their dashboard with a free account that supports up to 10 users.  For a UNIX system you download and compile their software (packages are available for Linux distros) which has a client program as well as a PAM module.  You add a new “Integration” (essentially an auth realm with its own API key) and feed the keys into the client configuration (which is only 3 lines long, btw) and run the client which gives you a URL to finish validating the host and your done.  10-15 minutes after first hitting their website you are up and running 2 factor security without a bit of pain.  Its so simple is just makes me smile… and how often does anything security related do that?

Duo supplies special variations of the service that are just as easy for Juniper, Cisco and Sonicwall VPN’s as well as a Web API… but I’m not going to address those here.

Once your UNIX host is setup, you have some options on how to employ it.  You can use PAM, which will make all users dual auth via Duo, or you can use a nifty per-user SSH trick by adding a command=”/usr/local/sbin/login_duo” to the beginning of your public key in the .ssh/authorized_keys file (which I didn’t even know was possible).  If you don’t have the ability to modify PAM this SSH hack is a great solution.

But whats really important is the experience of actually using it for auth.  Here is how it works for real using an SSH session.  When logging into your system and after accepting your password or key as usual, it stops the auth process and asks how to contact you:

Ben-Rockwoods-MacBook-Pro:~ benr$ ssh cuddletech.com
Password:
Duo login for benr

 1. Duo Push to XXX-XXX-1100
 2. Phone call to XXX-XXX-1100
 3. SMS passcodes to XXX-XXX-1100
Passcode or option (1-3): 1

Pushed a login request to your phone...

At this point the SSH is stuck.  Notice you have 3 choices: Duo Push (smartphone app), phone or SMS.  Duo Push is a free app for Android and iPhone which can accept push notifications.  When you do your setup part of the process will be installing this app if you wish, which only takes 2-3 minutes.  If you choose to use Duo Push, as I did, you’ll see something like this on your phone:

After accepting, your SSH session comes back to life:

Success. Logging you in...

Last login: Wed Jun  8 22:57:20 2011 from xxxxxx
                                __                       __
                       __      / /___  __  _____  ____  / /_
                    __/ /___  / / __ \/ / / / _ \/ __ \/ __/
                   /_  __/ /_/ / /_/ / /_/ /  __/ / / / /_
                    /_/  \____/\____/\__, /\___/_/ /_/\__/
                                    /____/
[cuddletech:~] benr$

It’s that easy!

Duo just got everything spot on, its easy, the documentation is clear and concise, its just beautiful.  The best part of it all is that its free for less than 10 users, which means that if you just have a single web server you wish to secure, you can!  Thanks to the SSH hack above you could even do it on a Shared Hosting account.  There is even a plugin for WordPress to use Duo for WP login.

To get started with it yourself, I recommend this post on the Duo blog: Announcing Duo’s two-factor authentication for Unix.  It walks you quickly through the whole process I described above.

In all fairness, I’ve only been using this for less than a day so I’m sure there are kinks I’ll run into and things to be improved, but it truly is amazing that I’ve got what feels like a solid solution working so quickly.  Auditing and logging gets a lot more interesting when you don’t have to second guess whether or not the user is in fact the user you think and this product opens up a lot of new possibilities and fills a much needed gap in the world of cloud security.

NOTE FOR OPENSOLARIS/ILLUMOS PAM USERS:

After you download and unpack duo_unix-1.6.tar.gz, run “./configure –enable-pam”.  Before you run “make” edit config.h and comment out the the line “#define HAVE_ASPRINTF 1″.  After that PAM will compile fine.  If you don’t, you’ll get “pam_extra.h:10: error: syntax error before “va_list”".  Also, make sure that you have an ‘sshd’ user for Duo to use.

ZFS Backup & Recovery using Hadoop HDFS

Monday, April 18th, 2011

Hadoop HDFS has essentially become the de facto standard in cluster file systems.  I’m theory I’m a big fan of Lustre; I say “theory” because it never got ported to Solaris, despite the fact that Sun bought Lustre.  But thats a different story.  HDFS is extremely portable and well supported by a thriving community who are doing anything you can image with it.

Consider a large cluster of production nodes.  They almost certainly have unused disk space, and its probly pretty fast disk.  Wouldn’t it be nice if we could aggregate all that disk together for backups?  With HDFS we can.  Setting up HDFS is pretty well documented (I can do it for Solaris users if there is a demand, but its pretty clear), so see how easy it is to get ZFS backups in and out of HDFS.

Here is a DFS report from my HDFS test setup (single node):


root@newton hadoop$ hadoop dfsadmin -report
Safe mode is ON
Configured Capacity: 2942589468672 (2.68 TB)
Present Capacity: 1019145412608 (949.15 GB)
DFS Remaining: 1017809015808 (947.91 GB)
DFS Used: 1336396800 (1.24 GB)
DFS Used%: 0.13%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 2942589468672 (2.68 TB)
DFS Used: 1336396800 (1.24 GB)
Non DFS Used: 1923444056064 (1.75 TB)
DFS Remaining: 1017809015808(947.91 GB)
DFS Used%: 0.05%
DFS Remaining%: 34.59%
Last contact: Mon Apr 18 14:18:30 PDT 2011

 



So HDFS is ready to rock. Now lets create a ZFS dataset and populate it with some data:


root@newton ~$ zfs create -o mountpoint=/backup_test quadra/backup_test
root@newton ~$ cp *.pdf /backup_test/

 

Now, lets create a directory within HDFS to put our backups in:


root@newton ~$ hadoop fs -mkdir /zfs_backups
root@newton ~$ hadoop fs -ls /
Found 11 items
...
drwxrwxrwx   - root supergroup          0 2011-03-05 00:07 /hypertable
drwxr-xr-x   - root supergroup          0 2011-03-02 12:27 /system
drwxr-xr-x   - root supergroup          0 2011-04-18 14:22 /zfs_backups

 

Ready to rock. So lets actually do the backup. We’re going to create a snapshot and the zfs send it to the stdin of “hadoop fs -put”. Once we’ve done that, we’ll delete our origonal ZFS dataset:


root@newton ~$ zfs snapshot quadra/backup_test@051811
root@newton ~$ zfs send quadra/backup_test@051811 | hadoop \
> fs -put - /zfs_backups/backup_test.051811.zdump
root@newton ~$
root@newton ~$ zfs destroy -r quadra/backup_test
root@newton ~$ zfs list quadra/backup_test
cannot open 'quadra/backup_test': dataset does not exist

 

OK, so the ZFS snapshot has been stored as a file within HDFS and we destroyed our dataset. Now, lets recover it using the reverse proceedure:


root@newton ~$ hadoop fs -get /zfs_backups/backup_test.051811.zdump \
>  - | zfs recv -d quadra
root@newton ~$ zfs list -r quadra/backup_test
NAME                        USED  AVAIL  REFER  MOUNTPOINT
quadra/backup_test         32.3M   316G  32.3M  /quadra/backup_test
quadra/backup_test@051811      0      -  32.3M  -

 

Notice that we lost our properties during the receive, lets fix that and check that our files are back:


root@newton ~$ zfs set mountpoint=/backup_test quadra/backup_test
root@newton ~$ ls -l /backup_test/
total 33076
-rw-r--r-- 1 root root   330028 2011-04-18 14:20 Deployment_Guide_for_HP_ProLiant_Servers.pdf
-rw-r--r-- 1 root root    88378 2011-04-18 14:20 GeekBench-Receipt.pdf
-rw-r--r-- 1 root root   101243 2011-04-18 14:20 HP_ProLiant_Health_Monitor_User_Guide.pdf
-rw-r--r-- 1 root root    90844 2011-04-18 14:20 HP_ProLiant_Support_Pack_User_Guide_861.pdf
-rw-r--r-- 1 root root   123419 2011-04-18 14:20 inthebeginning.pdf
-rw-r--r-- 1 root root 21337122 2011-04-18 14:20 Jurans Quality Handbook.pdf
-rw-r--r-- 1 root root  1338119 2011-04-18 14:20 perc-technical-guidebook.pdf
-rw-r--r-- 1 root root  2401352 2011-04-18 14:20 PowerEdgeR510_Technical_Guidebook[1].pdf
-rw-r--r-- 1 root root  5923274 2011-04-18 14:20 R710-HOM.pdf
-rw-r--r-- 1 root root  1103929 2011-04-18 14:20 server-poweredge-r710-tech-guidebook.pdf
-rw-r--r-- 1 root root   504059 2011-04-18 14:20 TheQualityTrilogy.pdf

 

Please note that the reason the dataset properties were not retained was because I’m using an old ZPool (Version 18). If your running a newer pool version (check with “zpool get all pool_name”) the properties will go with the backup stream.

So there you go. Backup and recovery using ZFS send/recv to and from HDFS. Straightforward and easy to implement.

OpenSolaris R.I.P.: The Day is Finally Here.

Friday, August 13th, 2010

This is a real thing. This is not hype or idle rambling. OpenSolaris is, as of Friday the 13th of August, 2010, dead. Read the full skinny in the leaked internal email to Solaris Engineering.

Here is the short version: OpenSolaris is dead. No more real-time/nightly code pushes. OpenSolaris 2010.05 will not happen, nor will any in the future. Solaris 11 Express will be the new “developer” release which will be available through OTN. Solaris will remain open source, but code will only be released after the product ships, not before.

Now, lets go bit by bit.


Today we are announcing a set of decisions regarding the path to
Solaris 11, and answering key pending questions on open source, open
development, software and binary licenses, and how developers and
early adopters will be able to use Solaris 11 technology before its
release in 2011.

So, Solaris 11 is the new hotness and the “community” is reduced to “early adopters”.


Solaris must stand alone as a best-of-breed technology for Oracle’s
enterprise customers. We want all of them to think “If this has to
work, then it runs on Solaris.” That’s the Solaris brand. That is
where our scalability to more than a few sockets of CPU and gigabytes
of DRAM matters.

This goes on for a while, but the message is clear. Solaris needs to not simply be another UNIX OS… it needs to be, as it was in the 90′s, the enterprise platform of choice.


We will continue to grow a vibrant developer and system administrator
community for Solaris. Delivery of binary releases, delivery of APIs
in source or binary form, delivery of open source code, delivery of
technical documentation, and engineering of upstream contributions to
common industry technologies (such as Apache, Perl, OFED, and many,
many others) will be part of that activity. But we will also make
specific decisions about why and when we do those things, following
two core principles: (1) We can’t do everything. The limiting factor
is our engineering bandwidth measured in people and time. So we have
to ensure our top priority is driving delivery of the #1 Enterprise
Operating System, Solaris 11, to grow our systems business; and (2) We
want the adoption of our technology and intellectual property to
accelerate our overall goals, yet not permit competitors to derive
business advantage (or FUD) from our innovations before we do.

This, really, isn’t so bad. But again, no community, just end-users. A return to focus isn’t a bad thing.


We will continue to use the CDDL license statement in nearly all
Solaris source code files. We will not remove the CDDL from any files
in Solaris to which it already applies, and new source code files that
are created will follow the current policy regarding applying the CDDL
(simply, that usr/src files will have the CDDL, and the very small
minority of files in usr/closed might not have it).

Ok, so existing code will not be closed. So, no drastic change.


We will distribute updates to approved CDDL or other open source-
licensed code following full releases of our enterprise Solaris
operating system. In this manner, new technology innovations will
show up in our releases before anywhere else. We will no longer
distribute source code for the entirety of the Solaris operating
system in real-time while it is developed, on a nightly basis.

So here is the killer… what I’ve been afraid of. No more nightly code. The upshot is that the code will still be available following releases to assist with DTracing, debugging, etc, but you won’t get real-time updates. The biggest downside is that you can’t see bug-fixes as they are put-back, and obviously anyone developing on Solaris is always playing catch up. It says “full release”, so I can’t expect that code will ship with each Express release. Maybe it will, I hope so.

It goes on to say that “technology partners” (such as Intel) will have full source access via OTN.


We will encourage and listen to any and all license requests for
Solaris technology, either in part or in whole. All such requests will
be evaluated on a case-by-case basis, but we believe there are
many complementary areas where new partnership opportunities exist to
expand use of our IP.

This is a sticky place. Code is shipped CDDL post-release, however they want to establish partnership opportunities. Clearly they are trying to ensure any businesses which rely upon Nevada will not escape from the partner programs and thus revenue opportunities for Oracle.


We will deliver technical design information, in the form of
documentation, design documents, and source code descriptions, through
our OTN presence for Solaris. We will no longer post advance
technical descriptions of every single ARC case by default, indicating
what technical innovations might be present in future Solaris
releases. We can at any time make a specific decision to post advance
technical information for any project, when it serves a particular
useful need to do so.

Flush… there goes ARC. So the external view into Solaris development is now closing. We now only see what they wish us to see.


We will have a Solaris 11 binary distribution, called Solaris 11
Express, that will have a free developer RTU license, and an optional
support plan. Solaris 11 Express will debut by the end of this
calendar year, and we will issue updates to it, leading to the full
release of Solaris 11 in 2011.

So, back to the old days.


All of Oracle’s efforts on binary distributions of Solaris technology
will be focused on Solaris 11. We will not release any other binary
distributions, such as nightly or bi-weekly builds of Solaris
binaries, or an OpenSolaris 2010.05 or later distribution. We will
determine a simple, cost-effective means of getting enterprise users
of prior OpenSolaris binary releases to migrate to S11 Express.

There is the axe on OpenSolaris, present and future. The distro isn’t coming. No nightly. No BFU’s.


We will have a Solaris 11 Platinum Customer Program, including direct
engineering involvement and feedback, for customers using our Solaris
11 technology. We will be asking all of you to participate in this
endeavor, bringing with us the benefit of previous Sun Platinum
programs, while utilizing the much larger megaphone that is available
to us now as a combined company.

And here we see again, its “back to the future” . Pay to play.

The Verdict

Frankly, I’m not surprised by any of this. Saddened, certainly, but not shocked. The sleigh ride is officially over.

As far as the community and governance is concerned, the OGB played right into Oracle hand. It might as well have been engineered this way. On Monday, the 16th, the OGB will disband and default on the charter. Great work guys! Thanks for truly representing the needs and desires of Ora…I mean, the community.

As a governance, OpenSolaris has been a non-stop, end to end failure. Hands down. At every turn, it failed.

As an open source project, it was luke warm at best.

What I will miss is having full access to Solaris Engineering. What’s happening, where we’re going. That was amazing. An all access pass. I will truly miss that.

The plus side is, that for all the ups-and-downs, the code is out there. They can’t take that back. And we have reasonable assurances that it will stay out there following “full releases”. That’s not ideal, but its something. Something very valuable.

As for me… Illumos will now carry the torch, and I’ll participate in that with all the more gusto. This blog existed prior to OpenSolaris and it will continue to be a Solaris blog after. Solaris is the best platform on earth, it continues to be, in any given form.

No News is Bad News

Monday, May 17th, 2010

A reader wrote today wondering why my entries have slowed down and there isn’t a lot of news coming. Quite simply there isn’t much to say. I’ve felt a need to return to blogging various smaller technical posts just to keep the blog on life support until something happens at Oracle.

Larry ranted about Sun’s problems to Reuters recently. Perhaps the only surprise was this: “More infuriating, says Ellison, is that Sun routinely sold equipment at a loss because it was more focused on boosting revenue than generating profits.” I think we all knew Sun was taking it in the shins to push sales, but apparently it was more widespread than I was aware. Larry added that Sun spent a fortune on airfaire the last days of a quarter to pack it in.

But I, like many of you, am focused on where we are now. Right now Oracle isn’t saying squat about the future of Solaris… just “its not dead, please wait.”

Regarding Solaris 11. There is no word. I personally believe with complete confidence that Oracle will announce Solaris 11(g?) at OpenWorld this September. I have no proof or evidence, I just personal believe it to be consistent with how Oracle operates and the development pace in Solaris Engineering. I think they’ll stay quiet until that release and then push Solaris forward with great gusto.

Regarding OpenSolaris 2010.03. It’s now 2.5 months late. It could come this week, it could come in 3 months… there is no way to tell. OpenSolaris dev updates on pkg.opensolaris.org have stalled at snv_134, so we know that snv_134 is the target build for 2010.03, but thats about it. I know they are working hard toward it, but I don’t know why or how or what precisely they are doing. Maybe they are vetting out all the compatability issues or fixing AI so it can get real adoption, but whats clear is that they are putting a lot of effort into something a lot of people think will be killed or sidelined.

A theory that just pops into my head is that Solaris 11 itself may be based on snv_134 as well and they are working on both to align them. Maybe that’s possible. I don’t know, its just a thought that comes to mind.

In the meantime, there is an increased exodus to Linux and other platforms. Particularly from customers who were unsatisfied with Solaris 10 and embraced SX:CE as an interim solution.

Some of you will recall my OPEN LETTER TO ORACLE. It got no response. None. Nothing.

Some of you will have noticed my push to explore the OpenSolaris delays through “proper channels”, that is through the OGB. The idea was this… OpenSolaris is developed in several OpenSolaris.org Community Groups (CGs), such as pkg (for IPS), Indiana, Distribution, ON (the kernel itself), SFW, etc. Therefore, if we wish to get answers, perhaps the OGB should simply formally request a status report from the appropriate CGs! That makes sense right? But the board members can’t seem to think this way… they insist that “OpenSolaris” the distro is an Oracle controlled thing and therefore won’t even try. The OGB has no interest in governing their own community or even making attempts, vain as they may be, to make progress. This is precisely why I declined nomination this year, the OGB is completely useless. The press that resulted from my proding turned into a useless circus with OGB members suggesting a fork is imminent…. and that’s simply madness.

But… is hope lost? NO! Is Solaris dead? NO! I still believe that things will be made right and that a new era is opening up for Solaris. We’re simply in a very scary transition period which will pass with time. Once Solaris 11 is released we’ll look back and wonder why we all made such a big fuss…. but it would sure be easier if we got a little love from Oracle or Solaris Engineering.

Keep the faith. Try to keep your organizations from flee’ing a great platform and just stick it out as long as you can. Lets all hope that Oracle gives us a little support here and lets all just keep taking it in the crotch until that time.

OPEN LETTER TO ORACLE: (Open)Solaris Roadmap

Tuesday, February 2nd, 2010

Dear Oracle,

Congratulations on the EU approval of your acquisition of Sun Microsystems, Inc. Many of us in the various Sun communities spent years working closely with Oracle products on Sun technology and feel right at home being part to the Oracle family. The business savvy and dedication to customer success will be a welcome change in the direction of all of Sun’s technologies.

While the strategy webcasts and FAQs have been fantastic, there are many questions customers have regarding the future of Solaris, OpenSolaris and the technologies within. It’s no secret that for several months Oracle has been involved to some degree in Sun engineering directions and therefore it does not seem unreasonable to ask for answers even so soon after the EU green-light.

First, and of foremost concern, is the future of the Solaris product for enterprise customer, currently “Solaris 10″. Will there be a Solaris 11? (It would fit nicely with Oracle’s scheme, btw.) Will it be compatible with existing Solaris technologies (Jumpstart, SysV PKGs, etc) or will the existing path to scrap these technologies in favor of new and unproven solutions created within the OpenSolaris platform be chosen instead?

Please understand that until recently customers could choose the traditional product (Solaris 10), the advanced development product (OpenSolaris Distribution), or use the bridge between these two worlds: Solaris Express Community Edition(SX:CE). However, with SX:CE’s recent retirement Solaris shops are forced to make a choice: go forward and accept uncomfortable and disruptive changes of OpenSolaris Distro or fall back into the technically inferior but fully supported and well understood Solaris 10. Sadly, some are opting to leave all together due to a lack of direction.

Decisions need to be made and customers need guidance in order to make them. Consistent with Sun’s legacy, the OpenSolaris project has been phenomenally successful in empowering customers and driving innovation, however management has continually failed to produce a coherent roadmap for enterprises to bank on.

Therefore, I would humbly ask that Oracle definitively provide guidance on the following:

  • A roadmap for enterprise Solaris customers
  • Guarantees with regard to the well-being and sustained viability of OpenSolaris as an Open Source community (independent of “OpenSolaris” as a distribution)
  • Future support and development for Solaris virtualization technologies, namely xVM (the best Xen solution in the industry thanks to ZFS, Crossbow, FMA, etc.) and Containers (the best Xen alternative in the industry), with respect to how they will compliment, supplement or be replaced by “Oracle VM”

I look forward to these details which will hopefully put an end to the Solaris FUD and put us back on a path of profitable and productive growth, for the sake of the community, customers, and Oracle itself.

Ben Rockwood

(Open)Solaris Developer & Evangelist

xxxxSolaris?

Sunday, November 1st, 2009

Here’s something in the category of “things that makes you go wha?!?”: The OpenSolaris Security Summit has been renamed to simply “Solaris” Security Summit.

If we’ve been looking for the first shot fired at OpenSolaris this would seem to be it. The question is whats next? When you combine this with the recent resurrection of “Solaris Next” (aka: Solaris 10++) it starts suggesting something is in the works, undoubtedly Oracle orchestrated.

Now, at this point I’m not jumping to any conclusions, and I don’t think you should either. Oracle’s intentions seem fairly clear at the moment and entirely positive for the future of Solaris and SPARC; and we know that X86 is also a part of that vision. Turning some love away from the OpenSolaris distro towards Solaris will be a welcome change for large enterprise customers, and undoubtedly a motivating factor.

My advise is to watch and wait… the wheels are turning.

If folks from Oracle/Sun are reading this; do what you wish with the Solaris product roadmap, but the community and source for Solaris are a critical part of a successful future. Please feel free to reassure us that we won’t lose that. I personally rely on access to the source for problem analysis and research on a daily basis and having access to Solaris developers, both badged and unbadged, is something I never want to be without again.