Internal hard drive/repair: SMART status; Disk Utility, Tech Tool & More

Question

Level 4

1,245 points

Internal hard drive/repair: SMART status; Disk Utility, Tech Tool & More

machine: 12" PowerBook 1.5 GHz PowerPC G4 (Aluminium) with 80GB internal HD
internal hard drive (original): Hitachi Travelstar 5K100 series HTS541080G9AT00
hard drive firmware: MB4AA5AJ
ATA version: 6
ATA standard: ATA/ATAPI-6 T13 1410D revision 3a

I have monitored the SMART status etc. of my internal drive with smartmontools for some time. Occasionally, I would see an error or a failed self-test but later testing always succeeded and things did not seem to be problematic.

At the beginning of this week, I started to see a lot of failed self-tests (though some still passed), a rising number of bad ("pending") sectors and reallocation attempts (though no reallocated sectors) and various other errors. The computer seemed to have trouble reading from the disk at times and Carbon Copy Cloner reported two I/O errors when cloning (a later clone succeeded). fsck showed errors although running fsck -fy repeatedly seemed to resolve them.

The Apple hardware tests initially reported an error (2STF/8/3:ATA-100 ata-6-Master) but I hadn't realised I should disconnect peripherals before running it so I did that and repeated the test which found no issues. I ran the extended test a total of three times with no errors.

As I continued to have problems, I booted from my clone and had Disk Utility wipe the drive by writing zeros to it. (I thought trying to get it to write everywhere would either finish the drive off or force it to reallocate the bad blocks. As I understand it, in normal use, the drive won't reallocate the blocks unless it can recover the data in the hopes of reading it at a later time.)

I then continued monitoring the disk using smartmontools. At this point, short self-tests succeed but extended self-tests "disappear". They don't fail, they simply vanish. They begin and smartctl shows the test in progress but then no error or result is logged - it is as if the test was never run. The first time I did this, I got an error saying the SMART attributes could not be read but subsequent tests do not trigger even an error. Short self-tests continue to pass.

There are now (according to SMART) zero bad ("pending") sectors but zero reallocated sectors, which seems odd. The raw read error rate fluctuates (zero one minute, many thousands a while later) although I am not sure it did not do this before.

Disk Utility claims the disk does not support SMART status even though smartctl clearly shows it does. Disk Utility claims the volume is "OK".

I ran Tech Tool Deluxe 3.04 from CD and 3.1.1 from my clone. In both cases, I ran all available tests on the drive. No problems were found.

I am seeing some problems even while booted from my clone - yesterday, the system froze completely and I had to force a shut-down by switching off the power (fsck then found a minor error but repaired it). Just before this happened, I was unable to mount a disk image and was trying to rectify the situation when the system froze. It is possible that the errors fsck corrected were implicated in the freeze, rather than caused by it, since Disk Utility found and repaired similar errors for the other two clones I have (on different partitions of my external drive - yes, I know this is sub-optimal).

Here is some current output from smartmontools:
---output: smartctl -q noserial -a disk0---
smartctl version 5.38 [powerpc-apple-darwin8.11.0] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Travelstar 5K100 series
Device Model: Hitachi HTS541080G9AT00
Firmware Version: MB4AA5AJ
User Capacity: 80,026,361,856 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a
Local Time is: Sat Jul 3 20:28:52 2010 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 645) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 55) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw Read_ErrorRate 0x000b 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0
3 Spin UpTime 0x0007 142 142 033 Pre-fail Always - 2
4 Start StopCount 0x0012 097 097 000 Old_age Always - 5793
5 Reallocated SectorCt 0x0033 100 100 005 Pre-fail Always - 0
7 Seek ErrorRate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek TimePerformance 0x0005 100 100 040 Pre-fail Offline - 0
9 Power OnHours 0x0012 039 039 000 Old_age Always - 26720
10 Spin RetryCount 0x0013 100 100 060 Pre-fail Always - 0
12 Power CycleCount 0x0032 098 098 000 Old_age Always - 4168
191 G-Sense ErrorRate 0x000a 100 100 000 Old_age Always - 0
192 Power-Off RetractCount 0x0032 100 100 000 Old_age Always - 201877487644
193 Load CycleCount 0x0012 001 001 000 Old_age Always - 2209599
194 Temperature_Celsius 0x0002 141 141 000 Old_age Always - 39 (Lifetime Min/Max 16/46)
196 Reallocated EventCount 0x0032 100 100 000 Old_age Always - 194
197 Current PendingSector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA CRC_ErrorCount 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 2138 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered UpTime is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2138 occurred at disk power-on lifetime: 26716 hours (1113 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 00 09 4f c2 a0 Error: IDNF

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered UpTime Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
b0 d6 01 09 4f c2 a0 00 08:30:04.500 SMART WRITE LOG
b0 d5 01 09 4f c2 a0 00 08:30:04.300 SMART READ LOG
b0 d1 00 00 4f c2 a0 00 08:30:04.000 SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
b0 d0 00 00 4f c2 a0 00 08:30:04.000 SMART READ DATA
b0 da 00 00 4f c2 a0 00 08:30:04.000 SMART RETURN STATUS

Error 2137 occurred at disk power-on lifetime: 26682 hours (1111 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 28 18 95 98 e4 Error: UNC 40 sectors at LBA = 0x04989518 = 77108504

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered UpTime Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 80 c0 94 98 e0 00 1d+01:30:37.600 READ DMA EXT
25 00 80 c0 94 98 e0 00 1d+01:30:30.600 READ DMA EXT
25 00 80 c0 94 98 e0 00 1d+01:30:22.600 READ DMA EXT
25 00 80 c0 94 98 e0 00 1d+01:30:16.300 READ DMA EXT
25 00 80 40 94 98 e0 00 1d+01:30:14.700 READ DMA EXT

Error 2136 occurred at disk power-on lifetime: 26682 hours (1111 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 21 1f 95 98 e4 Error: UNC 33 sectors at LBA = 0x0498951f = 77108511

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered UpTime Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 80 c0 94 98 e0 00 1d+01:30:30.600 READ DMA EXT
25 00 80 c0 94 98 e0 00 1d+01:30:22.600 READ DMA EXT
25 00 80 c0 94 98 e0 00 1d+01:30:16.300 READ DMA EXT
25 00 80 40 94 98 e0 00 1d+01:30:14.700 READ DMA EXT
25 00 80 c0 93 98 e0 00 1d+01:30:14.100 READ DMA EXT

Error 2135 occurred at disk power-on lifetime: 26682 hours (1111 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 22 1e 95 98 e4 Error: UNC 34 sectors at LBA = 0x0498951e = 77108510

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered UpTime Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 80 c0 94 98 e0 00 1d+01:30:22.600 READ DMA EXT
25 00 80 c0 94 98 e0 00 1d+01:30:16.300 READ DMA EXT
25 00 80 40 94 98 e0 00 1d+01:30:14.700 READ DMA EXT
25 00 80 c0 93 98 e0 00 1d+01:30:14.100 READ DMA EXT
25 00 80 40 93 98 e0 00 1d+01:30:13.500 READ DMA EXT

Error 2134 occurred at disk power-on lifetime: 26682 hours (1111 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 28 18 95 98 e4 Error: UNC 40 sectors at LBA = 0x04989518 = 77108504

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered UpTime Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 80 c0 94 98 e0 00 1d+01:30:16.300 READ DMA EXT
25 00 80 40 94 98 e0 00 1d+01:30:14.700 READ DMA EXT
25 00 80 c0 93 98 e0 00 1d+01:30:14.100 READ DMA EXT
25 00 80 40 93 98 e0 00 1d+01:30:13.500 READ DMA EXT
25 00 80 c0 92 98 e0 00 1d+01:30:12.100 READ DMA EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA of_firsterror
# 1 Short offline Completed without error 00% 26717 -
# 2 Short offline Completed without error 00% 26717 -
# 3 Short offline Completed without error 00% 26717 -
# 4 Short offline Completed without error 00% 26717 -
# 5 Short offline Completed without error 00% 26716 -
# 6 Short offline Completed without error 00% 26715 -
# 7 Short offline Completed without error 00% 26714 -
# 8 Short offline Completed without error 00% 26697 -
# 9 Extended offline Completed: read failure 10% 26692 75071682
#10 Short offline Completed without error 00% 26687 -
#11 Extended offline Completed: read failure 10% 26676 77108511
#12 Short offline Completed: read failure 40% 26663 296917
#13 Short offline Completed without error 00% 26657 -
#14 Short offline Completed: read failure 10% 26646 296918
#15 Short offline Completed: read failure 70% 26645 296905
#16 Short offline Completed: read failure 10% 26644 296910
#17 Short offline Completed: read failure 10% 26644 296910
#18 Short offline Completed: read failure 40% 26643 296916
#19 Short offline Completed: read failure 20% 26643 296909
#20 Extended offline Completed without error 00% 26620 -
#21 Short offline Completed without error 00% 26618 -

Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum.
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT TESTSTATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
---end output---

Self-tests 1-8 were all run after I erased the disk. The others were run before I did so. Not listed are the extended self-tests I've started since erasing which, as I say, have simply disappeared.

The checksum error regarding the log is normal for this disk. At least, I've got that error ever since I started using smartmontools for monitoring so I assume it is normal. (I installed this version of the software in about April 2008 and have used it since.) Also, starting offline testing has never succeeded on this disk (I think I checked into this at the time but can't quite remember - this is not new, anyway).

Questions

how should I interpret all this? (Why does Disk Utility say SMART is not supported while smartctl clearly shows it is at the same time? Why do the extended tests simply vanish? Is the drive definitely dying?) I do not want to replace the disk unless I absolutely have to because I understand that replacing disks in 12" PBs is no small matter and cost is an issue. I do not want to replace the machine unless I have to because cost is an issue and, also, I really like this computer and have no idea what I would want in its place, even if expense were no obstacle.

if it isn't clear whether the drive is dying or not, is there some further strategy I can use to establish this?

if the drive is dying, is a machine of this age worth repairing and, if it is, under what circumstances is it worth doing so? For example, it might be worth doing if you can do the job yourself, but that might be quite impractical for non-expert (not to mention, inexpert) users.

if there is a hardware problem (which I obviously think is very, very probable at this point), is it definitely a dying hard drive? (I've seen people write ominous things about disk controllers etc. which I gather are more serious - or less repairable - than a "mere" dying disk.)

what questions should I be asking you and what are the answers to those questions?!

Many thanks for your patience in reading this far.

- cfr

12" PowerBook 1.5 GHz PowerPC G4 (Aluminium), Mac OS X (10.4.11), 1.25 GB DDR SDRAM 80 GB HD

Posted on Jul 3, 2010 12:59 PM

Reply

Answer 1

eww

Level 9

53,028 points

Jul 3, 2010 1:43 PM in response to Clea Rees

Clea: The short answer to your very long question is that it's never wise to rely on any drive about which you have doubts. Drives are cheap. Replace yours, and then you won't have to worry about it any more.

Reply

Answer 2

Clea Rees Author

Level 4

1,245 points

Jul 4, 2010 4:09 PM in response to eww

Thanks. I do know that is the standard advice and I appreciate it. However, while it is true that drives are (relatively) cheap, getting them into laptops is not. At least, getting them into 12" PowerBooks is not. If I thought this was something I could do myself, it would be different, but I doubt it and I do not, unfortunately, have any suitably skilled and handily situated friend whose arm I could twist to assist me!

Reply

Answer 3

BGreg

Level 6

17,552 points

Jul 5, 2010 5:45 AM in response to Clea Rees

The 12" PB internals are a bit more complex, for PB's. If you don't want to replace the hard drive yourself or pay someone to install it, you could always get an external firewire hard drive, and use it to boot from and for general usage. Would have to be firewire, since the PB won't book from a USB device. One example of what you could get is a 160GB external hard drive: http://eshop.macsales.com/item/Other%20World%20Computing/MS4U5160GB8/ . All the choices with that case are listed here: http://eshop.macsales.com/shop/firewire/on-the-go

Have you called any Apple Authorized Service Providers to see what they would charge to install a drive for you? Whether you bought it or they supplied it? You can find a local one in the US at http://www.apple.com/buy/locator/service/

Reply