The Case for IPS
Posted on May 19, 2010
On the list of those who hate IPS, I am the first. Even so, it is a very long list, and one that will grow longer and longer as it ceases to effect hobbiest and early adopters and becomes a reality in a proper Solaris GA release.
While IPS has been around for some time now, only recently did I actually fully understand its purpose. In meeting with an engineer at Sun (whom I shall not name, but not directly involved with IPS) I started into my years old “Why IPS is an abomination” rant. However, during this vigorous back and forth debate, it finally clicked fully in my mind exactly why Solaris Engineering wants IPS so desperately. And, furthermore, why I am trying in vain to kill it.
You’ll recall back in time, when dinosaurs roamed the earth, that Bart Smaalder’s posted Rethinking Patching. This has been the real key, and I’ve known that, but for some reason his point was lost on me in the midst of his “Dim Sum” analogy and the takeaway phrase that ensued: “No more dim sum patching”. I’m sure some people really understood this and embraced it, but as for myself and many others I’ve talked with over the years, we somehow missed the essence of what he was pointing out and what its full ramifications would be.
Lets put the “dim sum” analogy away. What he’s getting at here is that Solaris patch management is and has been for a long time, a complete disaster. No one will dispute that. Proper patching requires that the environment be in a known consistent state such that it doesn’t inadvertently cause additional unexpected problems. But when you are installing dozens, perhaps hundreds, of patches over a period of time the OS become a tangled mess. The practical reality is that administrators reject “Patch early, patch often” for the more reliable “Don’t fix it if it ain’t broken.”
All that gets so much worse when people don’t simply patch the systems as Sun tells them to (through maintenance updates) but instead pick-n-pull only patches they want. Doing any kind of intelligent patch management is almost impossible because there are so many various dependencies and possible conflicts, only a handful of which that Sun even knows about.
Therefore, IPS first and foremost is about solving that very problem of patch management and going from one known state to another known state.
That’s it. It’s not about being easier to use (but that is a selling point). It’s not about being more like Linux (but that is a selling point). It’s not about moving from a bunch of DVD’s to a network repository (but that is a selling point). It’s all about unf**king patch management on Solaris for enterprise customers.
But, its not just about IPS. UFS isn’t an option for your root file system anymore; why? Because ZFS Boot Environments (BE’s) which allow point-in-time rollback following an update is just as critical to patch management and establishing “steady state” as IPS is.
Same thing goes for Sparse Zones. Managing dependencies between the Zones and the Global Zone is complex, IPS makes it even more so. Therefore, because IPS can already manage non-root images and ZFS Boot Environments can be extended to Zones, why not simply embrace the same model there. Those of us who would say, “Keep zones simple, let me manage the deps!” is akin to a customer saying “I still like the old pick-n-pull model, please let me continue.”
Additionally, although many of us value Sparse Zones for their security… Solaris Engineering (on the whole) see’s them primarily as a way to conserve disk space. Therefore, they will contend, ZFS Dedup relieves them of the requirement to support Sparse Zones.
For those of you who, like myself, want IPS removed from Solaris completely…. we’re almost certainly out of luck. The ONNV source code was changed, circa snv_131, to stop producing SysV packages from the Makefiles. It’s all IPS-ified now. That is why SX:CE died. Prior to that point, SysV packages would be produced from a build and they would be converted into IPS packages, this is why the IPS package names were the same as the legacy SysV packages (SUNWsomething). With that major code change in place, its not simply of matter of flicking the switch back to SysV.
Because Jumpstart is a harness around SysV packages, it dies too. You’ll notice that a single-user install from CD/DVD is actually just a local jumpstart… its the same process essentially. So when the old installer was replaced with Caiman (originally billed as an easy-to-use graphical installer), it now had to pickup the slack. Since retrofitting Jumpstart for IPS would essentially invalidate the work done on Caiman, it was extended to become AI, the Automated Installer.
So you can see that in untangling the patching mess, we’ve created a whole new breed of technologies which are simply a different kind of mess. And this isn’t one we can just petition to have removed. Solaris Engineering is right to try to ease the suffering of enterprise customers, and should be applauded for their valiant efforts to go the distance to solve it… even if it causes untold new problems for customers.
So, this isn’t about the continual Linux-ification of Solaris… that’s why they didn’t choose Yum or some other existing Linux packaging system. This is about patching. And if you, like myself, just accept that explanation and take it to heart, all the other weird and bizarre changes they are making seem to make sense. (We may still violently disagree, but they make sense.)