Disk utility stuck on checking catalog? Does it make sense to wait long?

Question

Level 1

14 points

Disk utility stuck on checking catalog? Does it make sense to wait long?

One of 3 5TB TimeMachine Backup Disks was stuck during backup in a never ending operation over more than a day. We later figured out, that there was a crash and restart on this server (we recognized this) but may have damaged the disk structure. There was no obvious fs_chk action visible in the activity monitor when filtering for `fs`.

First Aid did not report any issues on the whole USB device and quitted successfully.
First Aid on the disks only APFS volume (the TimeMachine Volume) was not successful, because the disk could not be dismounted due to a finder operation keeping a file lock on the volume.

It is useful to mention, that the disk was encrypted and created on Catalina by the TimeMachine Setup itself from a regular fully erased disks and had no specific Container Volume for the only backup volume. We come back on this later.

We tried to:

Close all other applications, including a FileMaker Server in the background.
Finder hard restarted
figure out if any open files are identifyable ( using `lsof` command in terminal and grep for `TimeMachine` because the volume name contained that term as well: `lsof | grep -I TimeMachine`

The volume was still not regularly ejectable.

We now forced an eject from the finder and managed to have the device still connected but dismounted. Before Disk Utility App took forever to allow any action on the device or volume.

We now could start Forst Aid on the selected ghosted volume. The process started but got stuck for more than a hour after:

...
Checking extents overflow file.
Checking catalogue file.

We found several web sources that suggest to wait overnight because on large disks this ma need a lot of time. e.g. Disk utility stuck on checking catalog? - Apple Community I could not answer to this thread anymore so I recreated this to post my experience later as my own answer.

After waiting overnight the process finished and everything is fine.

Earlier Mac models

Posted on Aug 8, 2024 10:57 PM

Reply

Answer 1

Top-ranking reply

acsrarmin Author

Level 1

14 points

Aug 8, 2024 11:25 PM in response to acsrarmin

I answer to my own question:

It makes sense to wait overnight to let First Aid finish its work!

There was a growing of the filesystem necessary to make the volume usable for a larger amount of data. This was not properly communicated by the macOS and maybe alredy in progress in the background.

Disclaimer: This report is actually describing the situation on a legacy archive system riunning under Catalina on a MacBookPro 15 2014 with no connection to the internet running as a low energy server with integrated USV in a HengeDock. The same may apply to current macOS and hardware or maybe has changed already (to the better or worse). Hopefully someone can find and use this documentation in a similar situation.

The First Aid finished overnight with success and the report was continued after some other actions ending with:

... omitted
... at the beginning ...
Performing fsck_cs -n -x --lv --uuid 54933A9D-*********************-62E2A7A02B15
... most omitted
Checking volume information.
The volume TimeMachine_accrStudioServer06 appears to be OK.
File system check exit code is 0.
Restoring the original state found as mounted.
Growing Logical Volume
Resizing Core Storage Logical Volume structures
Resized Core Storage Logical Volume to 5.000.164.016.128 bytes
Growing file system

... Success (similar to that, for me it was in german)

The interesting part is:

Growing Logical Volume
Resizing Core Storage Logical Volume structures
Resized Core Storage Logical Volume to 5.000.164.016.128 bytes
Growing file system

Due to the fact that we store a lot of old Archive stuff on that legacy server, this ends up with a lot of small files. Then making backups over a long period to keep track of the history, it seems that the APFS Core Storage Logical Volume structures were not sufficient and the reorganisation of the disk may have been been started already by Apple, but without notification while a backup was in progress. This leads to conflicting operations and disk trashing head movement.

Fortunately TimeMachine can usually take on on disk operations after breaks better than usually reported when using APFS.

Conclusion

The First Aid managed to resize Core Storage Logical Volume structures and grow the file system.

Can you rely on this disk after repair?

We use a special device driver to run smartctl on external USB drives
The lifecycle was reported with a bit more than 11000 hours equaling around 400 days and a bit more of a year of running time. The disk is stored in a safe place in rotation with 2 other media (2 attached, one on a different location)
Due to the fact that the issue was due to software crash, we expect the hardware of the drive as OK.
The repair had an obvious fact causing the issue.
The repair was successful
smartctl extended report did not show any recorded errors at any time on that drive
We have 3 media in rotation
Our decision is just to add one more drive soon and keep an eye on that until the next drive has enough generations.
We keep that drive and maybe it can save our life in a situation the others fail.

Addendum

For those who are interested to run extended smartctl operations on external drives

(do not do this at home ;-):

We use:

Homebrew and installed `brew install smartmontools`
We installed the free External USB / FireWire drive diagnostics support Driver suggested from e.g. https://binaryfruit.com/drivedx/usb-drive-support on our own risk!

If you want to perform external drive diagnostics on macOS – currently there is only one option – you can install 3rd party kernel extension – SAT SMART Driver. SAT SMART Driver is free open source project (published under Apple Public Source License) by Jarkko Sonninen. It is hosted on GitHub.

Reply

Answer 2

HWTech

Level 9

68,768 points

Aug 9, 2024 7:26 PM in response to acsrarmin

acsrarmin wrote:

2. The lifecycle was reported with a bit more than 11000 hours equaling around 400 days and a bit more of a year of running time. The disk is stored in a safe place in rotation with 2 other media (2 attached, one on a different location)

6. smartctl extended report did not show any recorded errors at any time on that drive

Addendum

For those who are interested to run extended smartctl operations on external drives

(do not do this at home ;-):1.

We use:

Homebrew and installed `brew install smartmontools`

FYI, there is no reason for Homebrew since the smartmontools project actually provides precompiled binaries for macOS for many years now. Years ago that was not the case, but they've been very good about providing the precompiled binaries for some time now.

We installed the free External USB / FireWire drive diagnostics support Driver suggested from e.g. https://binaryfruit.com/drivedx/usb-drive-support1. on our own risk!

That is a very nice app since it is very user friendly. For hard drives DriveDx is great since any "Warning" or "Failing" condition means the hard drive should be replaced because the hard drive is worn out or failing respectively (personal experience supporting my organization's Macs). Unfortunately apps like DriveDx can only alert to a possible issue with an SSD since not all Warning or Failing conditions for an SSD are fatal.....SSDs need to have their health reports manually interpreted by someone familiar with SSDs. Unfortunately many SSDs these days have only a very limited health information which makes it very difficult to truly know the SSDs true condition.

Reply

Answer 3

BDAqua

Level 10

250,253 points

Aug 9, 2024 2:49 AM in response to acsrarmin

Thanks.:) 👍

Reply

Disk utility stuck on checking catalog? Does it make sense to wait long?

Similar questions