Crashing Solaris for Fun and Profit
Posted on June 9, 2009
Crashing is the wrong title actually. We’re talking about panics. Its sort of like saying “hacking” when you mean “cracking”. A “crash” is when an OS preforms some operation that typically causes the system to reboot. Solaris is very unique from rival Linux in that 99% of the time such an event will be caught by the OS and handled as a “panic” instead of an uncontrolled crash.
While its not a sexy feature of Solaris, panics are extremely important things. The reality is that sh*t happens, it just does. When it does happen you want to collect as much information about the event as possible to fix the problem that resulted in badness. Therefore, the advantage of a panic over a crash is a crash dump which can be analyzed for the cause. This is why panics are a very good thing indeed. As I like to say, “If you’ve gotta crash, fine, but you’d better give me a reason!”
This blog entry is interested in the 1% of Solaris issues that don’t result in a panic, but you wish would. In between the concepts of a “crash” and a “panic” is that dreaded situation we call “hung”. A “hung” or “locked” or “wedged” system is one that is still running, technically, but is otherwise unusable. Typically this is the result of the kernel being ok but the userland stack being trashed beyond repair. Perhaps the two most common causes of this are abuse cases such as memory/swap exhaustion or fork bombs, where the kernel is still running but processes can’t spawn to let you see whats happening. The normal way to deal with this is to reboot the system, either via IPMI or sending some poor soul (typically, you) down to the data center to unceremoniously press the big button.
So here’s the problem with rebooting a system in a bad way…. it doesn’t panic, meaning you might get the system back up post-reboot but you have no idea what happened unless something in the logs tips you off. If there are no relevant logs entries you have to simply shrug and pray that it doesn’t happen again. So what can you do about it? On SPARC systems you’d hit “STOP-A” and sync the box. But, what about Solaris/X86…. what do you do then?
Thankfully there are three very handy ways of dealing with such situations. They can be combined or used individually. Lets take them in turn.
Please note! This entry applies only to Solaris/X86!
Panic on NMI
Intel introduced a concept of Non-Maskable Interrupts a long while back. These NMI’s are extremely high priority and can not be blocked by the OS. While I’ve had trouble fully researching them, the most common use is to kick an otherwise unresponsive system into a “diagnostic” mode. On some systems its implemented as a jumper, others a button, yet other IPMI or SP commands. In the case of Solaris/X86 there are two tunables that can cause the OS to react to NMI; one causes a panic, yet another causes the system to drop into a kmdb session for live debugging. By default Solaris will simply output a message to console saying an NMI was received but otherwise do nothing.
The first of these two tunables can be added to /etc/system, it is not dynamic. You must reboot for it to take effect, it is:
Once this is added and the system rebooted the receipt of an NMI will cause an immediate panic followed by reboot. The most common way to invoke this behavior would be via the IPMI command: ipmitool -I lanplus -H somehost -U root chassis power diag. Remember, if you get a seriously stuck system your going to reboot the box anyway, typically via IPMI, so instead of using “power cycle” you choose “power diag” and have some tasty data to dig through (or send to Sun).
I like to call this feature “panic on demand”. 🙂
Enter kmdb On NMI
In addition to, or instead of, the panic option above, we can use the following tunable to drop the system into the kmdb kernel debugger on NMI receipt:
If it follow the panic option above it’ll panic and then drop into the debugger, which is the best option. But please know that this only will work if you have kmdb loaded at boot time by adding the -k kernel argument via your bootloader (ie: GRUB). If your working with a production system this might not be something you want hanging over your head all the time and thus a more developer oriented solution in my mind. If your writting drivers you’ll certainly want to keep kmdb loaded, but everyone else will more likely prefer the “panic and reboot” option above.
Snooping; The Deadman Watchdog Timer
Ever watch or read The Abyss? There was a mini-sub that would automatically float to the surface with surveillance tapes and such, if a timer wasn’t reset every 12 hours, in hopes that if anything went wrong there would at least be a partial record of events. Watch dog timers are a similar concept… the kernel uses some means to determine function and if it ceases to function the white flag is waived and it panics.
While its rare, the comment in code (line 163) is the best explanation: “Setting “snooping” to a non-zero value will cause a deadman panic if snoop_interval microseconds elapse without lbolt increasing. The default snoop_interval is 50 seconds.”
So we simply add the following tuning(s) to /etc/system:
set snooping=1 set snoop_interval=90000000
The first line enables snopping. The second line changes the default 50 second interval to 90 seconds. I don’t see any reason that 50 seconds isn’t long enough, but if you want to be paranoid and use a 5 minute interval you can, just change 300 seconds to microseconds and reboot to lock it in.
Snooping has been safe in all my testing to date, but obviously it will feel risky to the casual sysadmin, so this is not something I’d enable by default. If, however, you have a system that mysteriously goes into a dead hang in the middle of the night, this is a better option than being woken up just to testify that it did actually reboot and you still have no idea why. 🙂
As stated, the best way to poke NMI once your ready for it is via IPMI (tested on Sun and Dell):
# ipmitool -I lanplus -H somebox -U root chassis power diag
If you have a moderately recent version of ILOM you can poke the /SP/diag/generate_host_nmi value like so:
-> cd /SP/diag /SP/diag -> show /SP/diag Targets: snapshot Properties: generate_host_nmi = (Cannot show property) state = disabled Commands: cd set show -> set generate_host_nmi=true Set 'generate_host_nmi' to 'true'
One you’ve got a dump things are on the uptick, but given that you may not be a kernel developer, and if your wise have no desire to become one, you have some options. The first is to send the panic to Sun and hope they come back with a good answer. The second is to use this as your opportunity to dig deep into the guts of Solaris and learn something interesting. I recommend reading my blog entry some time ago: Solaris Core Analysis, Part 1: mdb and Part 2: Solaris CAT, reading The Solaris Operating System on x86 Platforms: Crashdump Analysis Operating System Internals (PDF), getting the latest SolarisCAT, and if possible attending one of Max Bruning’s courses.
Finally, a shout out to my p33ps in #opensolaris IRC for their help in assisting with my testing and research on NMI, in particular “LONGCAT”.