One for the Mac OS X hackers amongst us - many apologies if this post is overly techy for some of the contributors here.
I'm running 10.6.8 up to date on a variety of machines, the vast majority being genuine Apple hardware. Main workstation is a proper Mac Pro 3,1 but it has a LOT of peripherals and a very heavily loaded USB bus (30" ACD, 27" LED CD, two powered hubs and lots of accessories including a random FPGA running specialised software).
The main workstation is relatively stable and reliable. However, this Finder error is still present and occurs sporadically. Sadly I have no answer to the problem outside of the 'Microsoft Way' (i.e. reboot) which is, to me, entirely unsatisfactory for a Unix machine. But I do *think* I've narrowed it down to one specific issue.
When an external volume is connected, regardless of filesystem or means of connection (USB flashdrives, USB enclosures for mechanical drives, network connections (SMB, NFS, AFP, even extremes like iSCSI endpoints), Firewire enclosures (400/800, mechanical, SSD, custom RAID), eSATA interfaces), the usual Unix mount process kicks off, but this is controlled by an OS X-specific daemon called diskarbitrationd. Once the new volume is mounted, other processes hook into the new volume - often the filesystem is checked for errors (common to most Unix flavours), and as standard, OS X-specific processes become part of the chain, such as fseventsd (allows filesystem events to be collected and made available to other apps, for example preventing Finder from having to poll an open folder if you've just put a new file there from an external app - and obviating any need for 'refresh' of open folder windows) and mds (Spotlight search) opens the filesystem for indexing.
I find that the Finder lockup problem occurs only when there's a problem dismounting the external volume (or the external volume disappears, OS X tries to recover by forcing a disconnect (umount -f) but the endpoint is no longer there) and umount hangs. Looking at the process list, there's often an /sbin/umount -f /Volumes/Whatever command still running (long after the volume concerned disappeared from your Finder and /Volumes filesystem), and hence both fseventsd and diskarbitrationd (and mds, if running) are stuck in a 'U' state - i.e. Uninterruptible Wait. This is a hang, as far as the user is concerned, and no other volume-dependent messages can be passed to diskarbitrationd, hence the Finder won't start (even if you start it from the Terminal).
I've tried doing a kill -9 on the stuck umount process as root but, since it's waiting on a hardware ACK that won't ever arrive, it doesn't die. Hence it doesn't send messages down the chain to fseventsd and diskarbitrationd, and we have the Finder problems.
There should be some way of forcing diskarbitrationd to 'abandon' a particular volume - releasing the locked wait for a reply from the dead hardware (in my case, dead USB flashdrives are the VAST majority of causes of this problem). I understand that this would be very dangerous - passing this command to a connected, alive volume would probably cause corruption - but plenty of other Unix commands can trash the system.
Does anyone here with better OS X hacker skills than me know whether diskarbitrationd can be passed such a message, or whether fseventsd can be used to pass the required 'this volume's gone, mate, give up on it' message to the main daemon? The trouble is both are in 'uninterruptible wait' state so probably won't listen to anything other than SIGTERM...
I'm a bit wary of simply killing core daemons like diskarbitrationd since I could end up corrupting every connected volume (and I've got a lot going on)...
There may be other very specific cases that are different, but I believe the majority of these Finder problems are caused by this chain of events resulting in an Uninterruptible Wait for diskarbitrationd. Simply releasing the lock would let everything flow again without a reboot... well at least I'd hope so, for a properly designed Unix system.
It's obviously a fundamental problem since we've got 46 pages of commentary and it's not just happening to newbies or people with obviously bad hardware. I'm not accepting the 'solution' of 'upgrade to Lion' - besides, someone here has said that it still happens with Lion. I may try the 'nuclear option' of killing either fseventsd or diskarbitrationd, but I wouldn't recommend it. Really I need a response from an OS X core Unix system engineer who knows a LOT more about the OS X underbelly than I do. I'm no guru, but I'd happily accept a solution that involves tricky CLI stuff for the time being...