Frequent system hangs and freezes in macOS Mojave 10.14: Notes, reproducible errors, and possible workarounds
Like (too) many others, I’ve discovered Mojave to be a source of exasperation, particularly because it seems to lock up periodically, sometimes for minutes at a time, with no identifiable cause.
I’ve been closely watching this behavior, with the result that I’ve been able to reliably reproduce a system hang, make some surmises regarding the cause — at least in my case — and come up with a few suggestions that might work around the problem. Some of these suggestions are for users. Some are suggestions for Apple.
System install on Mac Mini (late 2012), 1TB rotational + 128GB SSD fusion drive. ~120GB free space at time of install. 16 GB RAM.
Symptoms:
• Beachballing. All active apps lock up and beachball. This happens whether there’s a lot of RAM available (50%+), or very little (a few hundred MB).
• Switching to different open app windows works, but any open window remains nonresponsive until the system hang clears. (I do not run anything in fullscreen mode, so I have no idea if task switching in fullscreen works or not.)
• Windows are draggable and refresh as expected while being dragged, but none of their contents respond to user commands (scrollbars are nonresponsive, clicking selectable items results in no state change, etc.) until after the hang clears.
• Switching to another desktop rarely works. If it does, the system remains unresponsive on the new desktop until the hang clears.
• Switching to Mission Control does not work.
• Any new app will not launch until the hang is cleared.
• The tags which typically appear over hovered items in the Dock are slow to respond, or nonexistent.
• Clicking a stack in the Dock doesn’t result in the stack opening.
• Ctrl/right-clicking results in no popup menus, even where they’re expected.
• There is no indication in Activity Monitor of any background process that has run away with the processors, or swamped memory.
• Activity Monitor only reports the beachballed apps as “Not responding”. Force-quitting or getting system info on them is not possible; Activity Monitor’s capability to respond to user commands is just as hosed as everything else.
• User commands are queued, and acted on after the hang clears; so normal commands to switch desktops (as an example) execute in series after the hang is cleared.
• Copy operations on large files sometimes bogs, then freezes, and throws (-36) errors. This was, for me, a significant clue.
Attempted fixes/workarounds:
1. Rebuilding file/folder permissions in Home folder, and at the root volume level.
2. Booting to safe mode.
3. Flushing caches.
4. Resetting PRAM.
None of these had any corrective effect.
Beachballing still occurred in safe mode, but its frequency was noticeably reduced. There were no instances of mdworker running in Activity Monitor while in safe mode.
One thing that did help was disabling Spotlight indexing for the entire volume. This did not correct the system hangs, but it made them less frequent.
Reproducible error:
Large files, which existed on this volume prior to the install, ground to a halt when being copied from the internal HD to an external USB-connected hard drive, and to networked volumes. I have about 110 GB of ripped DVD files originally intended for my media server, which could be copied and purged to free up space. Mostly, these files copied without incident, but occasionally, the copy operation would slow to a crawl, then grind to a halt for anywhere from 15 seconds to a minute or more.
While that happened, all open apps began beachballing.
Eventually Finder would throw an error:
“The Finder can’t complete the operation because some data in ‘the name of the file being copied’ can’t be read or written. (Error code -36)” [OK]
Clicking OK cleared the error dialog, but it was usually upwards of half a minute before the beachballing stopped. Thereafter I could locate the file in Finder, select it, and trash it.
Initially, I just tried re-copying the file, but Finder consistently bogged, halted, and threw the (-36), at the same number of reported MB copied, each time. Clearly the files in question do, indeed, have problems, and something involved in file I/O cannot work past them.
Relevant observations:
• OSX defragments in the background.
• Spotlight goes read/write crazy on any new system install, entirely rebuilding its index.
• Most of the fusion drive is rotational media, and therefore slower, relative to SSD.
• Most of the fusion drive is occupied by data.
Surmise:
Some particularly large files might be nominally incomplete or mildly corrupted. This incompleteness or minor corruption might be unnoticeable to the user or to most apps, but some process in Finder (or APFS) may consider the corruption irreconcilable.
These files may have fragments scattered, in bits and pieces, all over the drive.
It is conceivable these files became corrupted during normal system defragmenting; when I first encountered these system hangs, I toggled the power on more than one occasion to force a reboot.
Non-user-facing processes such as Spotlight indexing or file defragmenting might be encountering these files, and discovering the same errors that surface during file-copy operations, with the result being that the system hangs without any user-visible or user-initiated cause.
These symptoms may manifest on SSD volumes, as well. SSD is much faster than rotational media, but all that would mean is the duration of system hangs would be reduced. They’d still happen.
If my surmise is correct and global system hangs are all down to failed file I/O, creating a new user account would not correct the problem, because the corrupted data would still be present on the drive, and OSX would still be attempting to index it, or defrag it, or both. The new user account wouldn’t see or own any of that data, but the machine would still be working on it in the background, as part of normal global system processes.
Possible user steps to ameliorate (untested as of now; I’m still stuck in [-36] purgatory while trying to copy and trash):
1. Increase free space on the volume to 20% or more.
2. Disable Spotlight indexing of large files (e.g., photos or movies).
3. Disable system sleep and let OSX churn through the data over the course of multiple days/nights.
4. Copy large, relatively unimportant files to an external volume, and remove them from the Mini. (Currently in progress; this was how I discovered the reproducible error.)
Suggestions for Apple:
1. Robustify error-handling in file I/O operations. Finder (or APFS) should not grind to a halt, and bog the entire rest of the machine, when data appears to be missing or damaged during file I/O.
2. Analyze Spotlight indexing relative to background defragmenting, and do not allow both processes to be running at the same time, particularly when there isn’t much free space on the drive. Prioritize defragmenting over Spotlight indexing.
3. Make mdworker and Spotlight a little smarter about disk usage. When a volume is at 90% capacity, Spotlight should not be as prominent a process, and should not be actively reading huge installments of data, then writing out extensive cache files. Free space is far more precious than file indexing, on a largely-full volume. Either have Spotlight limit itself to only a few processes at a time in this case, or provide users with a “slim” option that lets us search by filename, but not content. In fact:
4. Give users an option in Spotlight so it only builds an index of filenames, and does not analyze documents for metadata or any other searchable content at all. When I’m searching for a document, I’m searching by name. I don’t need or want to see a list of every text or Web file on my drive that contains every word in my search parameters. This might be useful in Mail, but it is not useful in Finder. Spotlight does not need to be an exhaustive grep tool with a GUI front end. If I want grep, I have Terminal.
5. Allow error queuing for user-initiated file-copy operations. I should not have to respond to a modal dialog, and re-initiate copy, when one document of 500 or so throws a (-36) error. I’d rather the system kept a running tally of what failed, continue the copying with the next file, and present me with a list of failed files at the end of the entire operation.
macOS Mojave (10.14)