About 6 years ago or so I got tired of fixing problem with Tamarah Windows/Linux box and decided to pay the money for a 15″ PowerBook. It was an excellent investment, she could work on the couch, no more lockups and reboots in Windows or mysterious “Bennnnnnn!” problems in Linux. Since then she’s upgraded to a black MacBook, and when I joined Joyent they provided me with a MacBook Pro (which I’m typing on now). So far each of these 3 laptops has lost at least one drive. Since we’ve fallen in love with iTunes and iPhoto these drive failures have been a major blow, and prior to Leopard’s TimeMachine we didn’t do regular backups.
This post will refer solely to drives for personal use. In the datacenter you should be using RAID and/or backup or redundancy method in which case a single drive failure isn’t something you waste time trying to analyze or fix.
I’ve run into 3 major types of drive failure:
- PCB Failure: A case in which the PCB has been “fried”. This happened dramtically once when connected an IDE drive to a system and let the disk rest, upside down, ontop of the case. It ran fine for aminute and then pop/spark there was a hole burned in a chip on the PCB. In this case the only solution is to go to eBay and buy an identical drive and swap the PCB.
- Click of Death: This means catastrophic damage to a drive. The head is unable to position itself or read data and sweeps the platters in a sort of seizure. This is the sort of problem that likely requires you to open the drive or spend big bucks.
- Damaged Cylinders: This is the kind of problem where the drive seems fine, mounts up and you can read for a bit and then hits some area of the platters where it freaks out and eventually spins up and down. This is most clearly seen when you image the drive with dd and it hits some point and exits on max retries.
Information on drive forensics and recovery is sparse. You tend to get one of three answers:
- “d00d, totally put it in the freezer and then try it!” Variations come based on how you should protect against condensation, the best I’ve heard is to pack the drive in minute-rice.
- “Send it to DriveSavers” (or other) This is super expensive, anywhere from $600 up beyond $2,000. You send them the failed disk and optionally a new drive to restore to. This can take weeks and is only for super extreme cases.
- “Just download tool xyz..” There are lots of various software solutions for do-it-yourself drive recovery, most are old DOS based programs recommended on forums populated largely by Windows users.
In my most recent failure, the drive died one day for seemingly no reason. There was no impact or horror story, the OS just locked up, I rebooted and the OS would start to load and then just drift into an infinite slumber. I went through the painstaking process of replacing the drive in my MacBook Pro and re-installed everything from scratch. Once back up and running I put the old drive in a USB enclosure and attempted to image it using dd. Every attempt it would get 19GB into the drive and then give up.
This kind of problem is the easiest to deal with. There are special versions of dd, namely GNU ddrescue, which is just like dd, but instead of failing on bad blocks will track forward after a number of retries untill its read the whole disk, for better or worse.
In the case of my MacBook Pro drive I attached the USB enclosure to my OpenSolaris box, installed ddrescue, and imaged the drive to a file. Of the 80GB drive the tool reported that I lost about 250MB. I then created a ZFS ZVol of 80GB, used traditional dd to copy the image file into the volume, and then exported as an iSCSI target using iscsitadm. Using the globalSAN iSCSI Initiator for OS X I mounted the iSCSI Target, and used OS X “DiskUtility” to verify and repair the HFS+ Volume. All went well and I could then mount the volume and extract data. w00t!!! iSCSI Rules!
The tale of Tamarah’s MacBook drive didn’t end so happily. I had a backup of her laptop but it was really old. Glenn, our son, grabbed the laptop on the table sending it crashing to the tile floor below, hitting on the corner where the drive sits. The laptop was fine, but the drive was toast. After a Mac Genious showed us how to replace the drive I bought a new disk at Fry’s and got things installed and running again, but the drive contained a lot of projects she wanted, and is commonly the case, when I showed her the data from the old backup she was uncertain as to whether it was enough. This is a big problem of the “unknown”, when all your stuff is in one place you commonly forget what exactly is there.
I tried the USB enclosure trick but the drive wouldn’t even spin up… click of death. Given the sensativity of the data I didn’t want to go Rambo on the disk and so we sat down and had a serious discussion about whether or not it was worth having sent to a drive recovery company. The look on her face was enough to tell me what to do, and despite her guilt over the cost I sent it in. After a week and a half, the answer came back “nothing we can do”. The tech was friendly and we had a good discussion about drive recovery, but long story short there was no hope and we were out $800. Frankly, for a lot of people that money is well spent because at least you exhausted all avenues, morn and get on with it.
When it comes to hardcore “swap the platters” style repair things get dicey. As simplistic as hard drives seem there are a lot of gotchas that you won’t be aware of until its too late. This is where Scott Moulton of MyHardDriveDied.com comes in. Scott has done two presentations, both found on YouTube that provide a solid background for the black-art of hardcore drive recovery used by most of the big bucks recovery companies.
The first presentation is Hard Drive Recovery, available in 5 parts. This is followed up (a year later) by Advanced Hard Drive Data Recovery, again in 5 parts. Both presentations include excellent flash animations that illustrate his presentation perfected.
Scott Moulton has done an amazing service to the community by providing detailed and experienced information regarding hard drive tinkering and recovery, including things you would never otherwise consider such as “Live PCB Swap”… watch and learn.
Of course, an ounce of prevention is blah blah blah. Technologies like OS X TimeMachine and ZFS make backup easier and more realistic than ever before and most importantly reduce data duplication significantly. Online backup solutions are good, but frankly are only feasible on very high speed lines in this era where a trip to the beach can result in 2GB’s of new pictures. What I like best is the emergence of wi-fi USB Drive solutions that allow solutions like TimeMachine to backup whenever it likes without specially being hooked up to a drive… the more you back up the less there is to back up and the less hassle it is.
As a closing note… recovering the data is only part of the solution. I’ve found that some Apple apps like iPhoto and iTunes can be very unhappy when you attempt to import into a new system install. For instance, attempts to open my old iPhoto library have been unsuccessful. Thankfully I found iPhotoExtractor. As for iTunes, sadly iPods are not a backup solution… when attaching to a new system, even after authorizing, you may be told you need to delete and resync. In those cases, Sci-Fi Hi-Fi’s PodWorks can come to the rescue allowing you to extract and import music from otherwise unusable iPods.