X4200 M2, Sun, and a Sprinkling of Revolution

Posted on April 20, 2007

This is a followup to my previous post: Getting Fed Up With Sun: Can’t Get Systems, Breaking Existing Ones. I’m going to break things down area by area, given that I hit a lot in that one post.

Sun Fire X4200 M2: Changes And What You Should Know.

At Joyent we build out entirely on Sun X4100 Dual Proc, Dual Core, 2.4Ghz systems with 16GB of RAM for almost all of our products and solutions. Its a server that we know well and really love. In each of the environments that they are used we have at least 3 separate networks (public/private/storage) which makes Galaxy systems ideal solutions out of the box with no expansion Gigabit Ethernet cards required. Recently he ran into constraints acquiring new systems to stay ahead of customer demand and as a stop gap solution while we waited for X4100’s were sold X4200 M2’s at the same price we normally pay for X4100’s. I’ll say at the outset that our VAR, FusionStorm, was extremely generous is doing this for us and we’re appreciative of them offering this type of solution rather than just making us wait.

Of course, giving us X4200 M2’s was done because it should be identical at least functionally to the X4100’s we were already buying, just in 2U form factor, etc. We were dismayed when this wasn’t the case, and that frustration came through in my previous post. So lets step back and see what did change:

  • M2 2.4Ghz systems use Dual-Core AMD Opteron Processor 2216, Non-M2 2.4Ghz systems use Dual Core AMD Opteron Processor 280
  • M2 Systems use DDR2/667 ECC, Non-M2 use DDR1/400 ECC. (This bumps peak transfer from 3.200 GB/s to 5.3GB/s)
  • M2 Systems provide 10.7 GB/sec. access between each processor and memory, as opposed to 6.0 GB/sec. in Non-M2 systems.
  • M2 Systems 4 8-lane PCIe slots and a single 64-bit 133Mhz PCI-X, whereas Non-M2 systems feature 5 64-bit PCI-X (1 at 133Mhz, 1 at 100Mhz, 3 at 66Mhz)
  • M2 Systems feature 4 On-board Gigabit Ethernet interfaces, 2 are NGE (provided by chipset) and 2 are Intel E1000g, as opposed to Non-M2 systems that offer 4 Intel E1000g’s.

Now, most of that is good change. The two things that will hit you out of the box is that:

  1. The default installed Solaris 10 release will warn you on boot that you need to upgrade your BIOS due to an AMD Errata. A couple of reboots make you understand why, the system will sometimes crash on boot (see OpenSolaris BugID 6391774 and ensure that your running Solaris10 Update2 or Nevada 36 or newer). If you’re not normally in the habit of upgrading BIOS when you rack systems you’ll need to plan to. BIOS upgrade can only be done using the Service Processors Ethernet port (the Tools & Driver CD isn’t bootable as an alterative). If you’re like me and don’t use that Ethernet port (we access the SP via Serial Console to cut down on our already high ethernet port count) you’ll need wire it up for the upgrade. I did this by cross-connecting my MacBook’s ethernet port to the SP, assigning an IP to the SP port, then accessing the web interface (which is nice) and using it to upgrade to ILOM 1.1.1 which includes the latest BIOS. Process takes about 10-15 minutes per system and I didn’t have any problems.
  2. The NVIDIA Gigabit Ethernet ports are Interface 0 & 1, Intel E1000g’s are 2 & 3. I could not get the NVIDIA Boot Agent to PXE boot for JumpStart. To get around this I configured my storage network (interface 2) with a DHCP/PXE server, entered the X4200 M2 BIOS and pushed the Intel E1000g’s to the top of the boot order, rebooted the box, Jumpstart’ed perfectly on that interface, and then following the JumpStart re-configured networking on the box to match our standard configuration. A side note on this is that if you have scripts or processes that are aware of the interfaces you’ll need to change them, for instance we migrate Solaris Containers between systems and so we now have to ensure that when Containers are put on an M2 system they are re-configured to use nge0 and nge1 instead of e1000g0 and e1000g1. VNIC’s ultimately may provide a solution.

Clearly based on the specs we can see why a move from the non-M2’s AMD 8000 Series chipset to the NVIDIA is a good choice for the performance minded so long as we don’t see reduced stability (time will tell). And therefore its only the NGE decision that will really be noticable to you as an end user (and the crashes, but thats the fault of the CPU, not chipset).

Its fairly obviously why they would chose to introduce this change to the onboard gigabit interfaces: they are already integrated into the board! This means reduced chip count, reduced cost, reduced heat, reduced power consumption, etc. However, I still firmly believe its bad form, like buying a car with Goodyear’s on the front and Bridgestone’s on the back or a system with “identical drives” that has a Seagate and Maxtor.

Mr John Fowler, Executive VP of the Systems Group, contacted me and I posed these questions to him. He promptly responded and CC’ed Ali Alasti and Andy Bechtolsheim asking them to review the design decision and inquire about what that meant for the rest of the product line. I was also independently contacted by Andy Currid of AMD regarding my distaste for NGE. Long story short, they have reviewed Sun’s position on NGE, NVIDIA has been informed of my many problems with the NVIDIA Boot Agent, and solutions have been proposed all around.

I’m extremely thankful to Mr Fowler, Ali Alasti, and Andy Currid for all their discussion and interest in feedback. I’m also thankful to those indirectly involved in reviewing these things. NGE may be here to stay or it might not, but I’m satisfied that the voice of the customer has been heard, many who commented in this blog and mailed me privately have been heard, and it was seriously discussed and put on the table as an issue for serious review. Thats the most any customer can hope for and then just cross your fingers for change.

My advice for anyone considering Sun X4100 and X4200 systems is to carefully decide whether or not these NGE issues have any meaning to you or not. Clearly there are performance advantages on M2 that shouldn’t be dismissed unless lack of the two Intel E1000g’s is a serious problem that you can’t avoid. If it is, make your order and simply demand non-M2 systems and be very explicit with your sales guy as to exactly why so he can push that feedback up-stream.

Availability to Systems Inventory

Whenever you talk about getting systems fast, the name that comes up is Dell. And there is a great reason why… they aren’t a great server company, imho, but they are a great manufacturing company. Whatever you want, in any configuration, Dell can make it and ship it in record time. Its well know that part of the reason that EMC and Dell are partners was so that Dell could show EMC how to improve its manufacturing ability.

When you hear about my availability problems, something very important should be noted: we run what Sun considers a “custom” configuration. I want to be clear that if we chose to run a “boxed configuration” we’d get them much much more quickly. The big issue with the stock configurations is trying to get systems with 4 drive bays, 16GB of memory, and 4 2.4Ghz cores. If we buy stock 4GB configs we end up having to junk the memory which is a waste of money. We’ve hoped in the past for a no-memory stock config but I’m unaware of one yet. So to be clear, its not that Sun doesn’t have systems, its that they are slow to arrive in our configuration.

Wes Adams, Chief Customer Advocate at Sun, took the lead helping us get systems. We already had a deal worked out with our VAR that would get a bulk of systems always in reserve for us so that we could always have systems ready but fulfilling that bulk was slipping further and further. Mr Adams got on top of the matter and today systems are in our VAR’s warehouse ready for us to grab and go. It was really amazing how he pulled so many parties and departments together to get things on track. We’re really greatful to Wes Adams and those behind the scenes for making it all happen so quickly.

Clearly this is one area where Sun needs to improve in the future. Having great people help in times of need is great, but simply keeping the problems from ever happening is the goal. This isn’t anything new for Sun or its customers. And frankly, lets be honest, most companies have a tough time turning out custom configurations, its Dell that really increases expectation so much.

I’m hoping that Sun will really push hard in this area in the future. Sun was, many years ago, a company that put technology in customers hands the day it was announced. That changed and now we get announcements months before product ships and even then its been hard to keep up with volume. Fixing this issue will directly improve Sun’s bottom line and so you can bet they want to fix this, I have no reason to believe they aren’t serious and passionate about pushing product into customers hands as fast as possible, but clearly they haven’t yet succeeded yet. We’ll wait and watch. Its not a trivial thing to fix.

Sun Support: Management & Resources

I wanted to elaborate on my jabs at Sun Support before. Sun has 3 major resources of great potential:

  1. docs.sun.com: Access to product documentation
  2. SunSolve: Online Knowledge base, access to patches and patch reports, and the Systems Handbook (the old FE’s Handbook)
  3. Sun’s Online Support Center (OSC)

Of course there are others, like Sun BluePrints, Sun’s BigAdmin, etc, but those 3 are the important ones. So lets go one by one…

Sun’s documentation is really very good. Could it improve, sure, but it really is professional grade and well composed. docs.sun.com however is an aging site thats slow to navigate and very difficult to search on. More and more the solutions seems to be, don’t use docs.sun.com, such as the fact that systems docs are on Sun.com (example: X4200 Documentation). At some point they should either build out www.sun.com/documentation and trash docs.sun.com or unify both into something entirely new. In my opinion, IBM’s Redbooks (www.redbooks.ibm.com) is the model docs site.

Sunsolve is in a similar predicament. The site is aging and despite several improvements and upgrades over time its still an aging relic. The knowledge base is really more of a hodge-podge of documents than something that seems cohesive and doesn’t integrate with product documentation. In short, there is very little value add or self service capability provided by the support site. Addition of the systems handbook several years ago was a great move, but I was greatly dismayed when they removed the internal project names from each system, something that may seem very trivial but the bugs database, code putbacks, and many of the info-docs refer to systems by these code names (example: What is Munich? There are code putbacks for it, but which system is it?).

Additionally, I’ve made this comment many times before, but the fact that a search on SunSolve uses the standard Sun.com search page feels very offputting. It doesn’t feel like I’m getting something special or something of value, just another function of Sun.com

That brings us to the Online Support Center (OSC) which is available to customers with a support contract. Customers who choose to use Sun Remote Services (SRS) benifit by being able not only to manage contracts with OSC but see the SRS status information. However, SRS requires a Sun owned system to be put onto your network (at least, it did when I evaluated it at Homestead about 2 years ago) and required installation of management agents on all systems. The OSC is frequently down or the site times out and its only available to Spectrum contact holders.

Convergence would be very welcome. If docs.sun.com got a make over and all Sun documentation was centralized. If SunSolve’s knowledge base got a make over and was greatly expanded, perhaps by better cooperation between SunSolve and BigAdmin. If internal project/code names were put back into the Systems Handbook. If you could manage all your systems, warrenties and contracts via SunSolve eliminating the barrier between SunSolve and the OSC. On and on.

Frankly, while I loath to admit it, Red Hat has a pretty decent management site for customers. I can see all of my systems and all of my contract information on a per-system basis. I’d love such an interface for my Sun assets. If I could log onto a page and see all my systems, edit information about them, and see the serial numbers and warranty/support status, etc. This would be ideal. This would supply Sun with access to customer deployment information and allow customers to better manage their assets. Having the option to use a “phone home” agent to supply such information automatically to this imaginary assset tracker would be great in addition to the ability to self register for systems too sensitive to have such a phone home agent present. Right now SRS is too much and the OSC is too little.

This is, of course, a lot of work! A lot of planning and cooperation between a variety of Sun departments, but I think it would pay great dividends over time. If this was extend furthermore for Online support management, such as managing your past and present Sun Support cases in a nice smooth way, it would be all that much more powerful. The point is not that these things don’t exist today, but that they are poorly integrated, available to only certain customers, and aging very very quickly.

To suggest that these types of improvements haven’t been suggested in the past or even acted upon would be arrogant, to be sure. But year after year there has been a hope or belief that these things would improve and to date they have not.

The important take away here is that if your a customer you should do the best you can to convey your wishes to Sun. We don’t have to bombard Jonathan with email, but talk seriously with your Account Manager or Sales Rep with an earnest desire to see change. As I said before, just throwing your hands in the air and buying from a competing vendor is non-productive. When Sun see’s change in market dominance they may see people choosing Dell or HP over Sun, but why? Why did you choose something else? What is the price of Sun solutions? Poor online support facilities? A bad experience with Sun Support? A bad sales experience? Unable to actually buy that Thumper your dreaming of? This is all information that needs to be constructively conveyed.

Before I leave the topic of Support, I do want to say that the Sun Support organization is, by and large, very very good. I’ve been very happy with the professional manner in which both Sun Phone Support and members of the FE or SSE organizations conduct themselves. Sure, I’ve had my bad experiences like anyone else, but 99% of the time saying the words “Please escalate this ticket” will speed things along. Sun’s support personel should be very proud of the service they provide and quality of service. If you don’t believe me, call Apple Support and talk to a teenager who’s completely unsympathetic to your issue as you listen to hundreds of other support people behind her.

A little Revolution Now and Again is a Healthy Thing

Last Tuesday Jonathan blogged about What Brand Means. I identify with him in more ways than one. Tamarah and I got married in 1999 just as he did. We married in the church I had grown up in in Vallejo, north east of San Francisco. We planned to roam Northern California camping out of our VW Jetta for our honeymoon and decided to spend the first night in Reno at the Reno Hilton and then the following day start our camping adventures in Tahoe. After driving 5 hours or so to Tahoe, time which flies by when in that glorious “We’re finally married!” stupor, we arrived in the lobby and enthusiastically checked in. As it turned out some big event was going on and the entire hotel was booked solid and, low and behold, we arrived after 10pm so they’d given away our room. The offered to let us wait in the casino with hopes that something would open up…eh, no thanks.

After talking to every manager possible it was clear that there was nothing that could be done, and so we went down town to the cluster of casino hotels in search of something… anything. We stumbled desperately into the El Dorado and to the front desk and were promptly told there were no rooms available. In frustration I groaned, “Please, we’ll pay anything, we just got married, we’re only staying for one night, and we’re exhausted… anything, really, anything will do.” With a smile and that “This seems like a line from a movie” sort of look, he said, “Well, there is one room that just became available but its own (blah blah) suite and costs (major money).” “WE’LL TAKE IT!” As it turns out it was the nicest suite in the hotel and they cut us a major break on the room. It was far better a room than we would have had at the Hilton and the staff was really great to us. We spent the wee hours of the night in our gangster-like suite drinking coffee and eating cheesecake in the hot tub and wondering how we’d gone from newlywed vagabonds in a VW full of camping gear to living it up in a suite that was fit for Frank Sinatra.

The rest of our honeymoon was a dream and we were spacey with excitement and happiness that didn’t fade (except for a minor mistake in venturing to Eureka) untill we’d finalized the honeymoon week with dinner at our favorite restaurant in San Francisco’s Haight District: Zare.

But back to Jonathan’s point… he rightly says, But a brand must go beyond a promise. To me, a brand is a cause – a guiding light. And this is why I think so many of us love Jonathan so much. He gets it! He understands what Sun is and what it means to so many of us. As he says later in his post: That’s not about money or resources or training or contracts. It’s a cause. One your employees – and more critically, your customers – willingly join. This has also been true of Sun, from the very beginnings I believe. When Scott wouldn’t take “No” for an answer, when Andy sanded down IC’s to make them fit, this is a company thats different and that spirit is instilled into its customers, turning them into evangelists.

Sun is, in so many ways, the AT&T, the Xerox PARC, of our day. Its the home of legends like Tim Bray, James Gosling, John Gage, Andy Bechtolsheim, David Yen, Whitfield Diffie, Jeff Bonwick, Bill Moore, Bryan Cantrill, Tim Marsland. Mike Shapiro, Marc Tremblay, Steve Rubin, and so many more. Look at where innovation is happening today, look at where the innovators work… Sun Microsystems. Sun is company that does great things, fosters and produces innovation, dares to take chances (Thumper and Blackbox certainly fit that bill), and stands behind the technology it produces innovating yet further when others have begun to throw in towel (SPARC dead? Hellooooooo Rock!). Its only natural that great people are drawn to work at Sun, customers want to work with Sun, and users far and wide want to be a part of Sun’s success.

When OpenSolaris first made its first baby steps during the early Pilot phase, there was lots of talk about how to “build a community” around OpenSolaris. I kept telling people over and over, There is a community! A thriving one, thats existed for years! You don’t need to build a community, you need to invite the one you have inside!. And thats just what happened and what was responsible for the seeming overnight success of OpenSolaris, the community was there and waiting all those years, finally it was able to embrace Solaris fully and in a new and unique way.

But like all relationships, even a couple desperately in love some times need to kick each-other in the butt. Tamarah and I don’t fight, but once in a while we do need to kick the other in the butt. Thats partly what a relationship is about, observing the other and pointing out times when certain behavior that isn’t healthy or in their best interest, not for your sake, but for theirs. That fits squarely into “What are friends for”, because, better from your spouse or a friend than from an enemy… or in-law, as the case may be.

The previous blog entry I made got an excellent response. I got dozens of private mails agreeing with my observations and frustrations, and was a bit shocked when the reactions even spilled over into Jonathan’s blog entry referred to above. A bit of pent up frustration or a mini-revolution, who knows, but clearly my frustrations are not just my own. As I stated above, I’m adamant that customers, users, evangelists, and anyone who cares for Sun as a company, as an ideal, as a cause, keep the lines of communication open and make your wishes, desires, and demands known. No one person is going to scream so loud as to create major change, but a chorus will.. and whats best is when that chorus of voices isn’t organized, it isn’t directed, its simply the voice of the customers independently saying the same sorts of things and not suppressing that feedback because “someone else probly said something or they know”.

Sun is listening, you know. The mic is on, whether you realize it or not. Jonathan and his executive team, frankly all of Sun, has a long running and well established open door policy, whatever it is that you need to say can be said, regardless of whether you get a response back or not, it will be received. This isn’t an ability to brush off or take lightly, but rather to embrace and enjoy.

So, if you take anything away from this, know that Sun has its faults, sure, but its also a company thats responsive, interested, and engaged with its customers. And thats something we should all be thankful for. Thankful for the culture brought to Sun by Scott McNealy and a culture that continues to grow under the leadership of Jonathan Schwartz. Sun isn’t the company you get feed up with and move on, its the company you get feed up with, sit down and talk it over with, and then work towards a better and brighter future for everyone with, and thats the company we love.