The Secret Behind MediaTemple’s Grid-Server
Posted on October 22, 2006
There has been a lot of buzz around (mt) MediaTemple’s latest offering this week: (gs) Grid Server. I listened to a podCast at TechCrunch and was really sucked into the marketing speak about the offering. But as a SysAdmin I wanted to know how it worked. The key to the product is that you setup your enviroment once and its “automatically deployed on the grid”, thereby even your little site is benefiting from the collective resources of the grid.
I had to know how it worked. I called MediaTemple but they wouldn’t tell me anything… frankly I don’t think the guy I talked to even knew. So I bought an account to look for myself and found something very very interesting. The real story isn’t MediaTemples Grid Server, its actually BlueArc.
The secret behind GS isn’t revolutionary, but it is clever. Basically they have, at last check, 17 systems running Debian. Thanks to a BlueArc press release I know they bought a Titan back around the middle of this year. They had a relationship with HP, found via Google in an HP success story, which leads me to believe they are still using HP systems, and specifically I think they are using HP ProLiant DL360 2.00GHz G5 Servers based on data from /proc. Interestingly there are 4 Xeon 2Ghz cores and only 2GB of memory per system. There is no local storage, instead the systems boot a root filesystem via NFS, and user storage is also mounted NFS.
The Grid magic is this: store all use data on NFS so that no matter which system you connect to you can access the data. Then spread your vhost configuration to all hosts in the “grid”, so that any system can serve your data. This system is therefore highly scalable because adding an additional node to the “grid” is trivial and reliable because if one system dies, big deal. But this means that you require two things to make it work: really good load balancers and really good NFS storage. And by good I mean very reliable and extremely fast.
And thats where BlueArc Titan fits into this story… without the performance offered by BlueArc Titan the Grid Server concept just can’t work and becomes a disaster. Putting all user data on the Titan is a big vote of confidence but putting all the root filesystems on it says something even more telling. No doubt the idea of putting root filesystems on NFS was not to reduce componants in the servers but to facilitate provisioning and change management by means of cloning a “golden root” and rebooting each machine.
I have no idea what load balancer they are using. Apparently whoever it is isn’t putting their name in a press release. In a setup like this I’d only choose to go with F5 BigIP, but who knows. They do have Pound installed on each node but I can’t imagine that they’d spend money on systems and storage but not on load balancers.
Of course, this leaves one problem, especially if your a Ruby on Rails developer: PHP can be served by any host by Apache, but Rails apps use their own webservers (WEBrick, Mongrels, or ligHTTPd). Thats where the (mt) Containers come in. I’m less sure about how that works, and frankly less interested. Basically you create a little container (64M in the low end account) within which you setup your Mongrels and that then starts the binaries in n number of grid nodes. This applies to any application that requires running binaries, so Java developers aren’t welcome (untill they design Tomcat/Geronimo containers). If your a developer, look before you leap, (gs) might be great for static content and Apache CGI, but otherwise look elsewhere. These are by no means to be confused with real containers or what many call “Virtual Private Servers” (VPS) or even “Virtual Dedicated Servers”.
Back to BlueArc, the real story here, I’m impressed that (mt) trusted their solution to them. Its a testimate to the reputation BlueArc is building in the industry. I am a little interested in the configuration in terms of performance because I found that with 8K blocks I get 102MB/s in a TextDrive Container (NFS on Thumper) vs 72MB/s in a MediaTemple Grid Server (NFS on Titan), shocked actually, I would expect the BlueArc to blow away Thumper, but I’m withholding judgement for now. What I’ll be watching is how the performance changes over time, as (mt) moves more customers (new and old) into the “grid”. If I do a benchmark in 6 months will I see the same performance or reduced? When there is maintance or failure on the Titan (unlikely as that might be) will it take down the entire site? It shouldn’t of course, but that depends onwhether (mt) bought a redundant configuration. In short, the fate of (mt) rests squarely on that device… lets see how things go.