Recovering a corrupted jounalled Mac HFS volume
Corrupted system disk caused by journaling fault that diskutil cannot deal with because volume will not mount writable. Normal solution is reformat but you may not need to do so...
Background:
System crashes - screen appears telling you to hold in power button.
System restarts.
A few hours later system crashes again.
When rebooting system - shortly after the startup sound and appearance of the progress indicator, system crashes again.
Safe startup (startup holding down shift key) does not work.
Verbose startup (startup holding down Apple-V) reports that during startup the when the system volume is mounted, that
jnl: transaction too big
panic: We are hanging here...
The system volume is a journaled HFS+ system volume. At the time of the crash, an analytical application was running that generates a logfile that is typically in the 10-30GB region. After the run the file is gzip compressed (to about 2-6GB) and the original deleted.
Boot off the install CD and system still crashes at startup - at the point where it tries to mount the hard drive.
Open case, and discover that the SATA data connector is very loose - nearly off the pins - and that the locating tab has broken off in the plug - presumably with the vibration of the drive. After a lot of fiddling around, get the cable out from around the drive cage and manage to get data cable back on drive and drive powered.
Still will not boot, but will boot into single user mode (Apple-S) and can run fsck on volume. fsck reports fixing up a number of problems. After clean fsck run, try boot again, but still no boot and same error messages when trying to mount volume in write mode (jnl: transaction too big).
diskutil just hangs because it cannot mount the drive for write, do cannot use it to repair volume.
At this stage I am spotting a flaw in the whole show, I have a volume that has integrity as far as fsck is concerned but will not mount writable so I cannot use diskutil which means I cannot turn off journaling. Everything I can find on the web says its time to reformat. Can't be - I would never do that on a Linux system 🙂
After a lot of hunting around come across this utility:
/System/Library/Filesystems/hfs.fs/hfs.util
Even has it own man page.
So here is what I did (took about 4 hours of experimenting to find out the correct sequence):
Start in single user mode (note / partition is now mounted read-only)
Run fsck to check the volume is clean (fsck -fy)
Use hfs.util to switch off journaling (hfs.util -U)
Rerun fsck (took 2 or 3 runs to get a clean filesystem)
mount file system writable - it now mounts (mount -uw)
Boot system from install CD
After language selection, start up diskutil (GUI version)
Verify disk (still got some issues)
Repair disk (disk repairs on pass 2 - warnings about two files missing)
Re-enable journalling
Reboot system from hard disk (all is well)
Of course the issues that the drive connector is snapped is not sorted - so have had to order new drive for machine. When it arrives 'ditto' will duplicate the volume and we will be away safely again.
I have been using SATA drives for a long time - never had one break before, but I have read many times about it happening. I can only think that when I installed a second drive about two months ago, in the process of putting it in the cage I knocked the connector on the top drive and this is the result!
I think it also shows a bad flaw in diskutil - unless you can mount the volume writable diskutil cannot help you. Seem to me there ought to be a force mode where diskutil can achieve what hfs.util does. Maybe there is and I just do not know about it...
G3, G4, G5, Xserve, PowerBook etc Mac OS X (10.4.7) What to do when diskutil will not work...