Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Disk Utility: for bad blocks on hard disks, are seven overwrites any more effective than a single pass of zeros?

In this topic I'm not interested in security or data remanence (for such things we can turn to e.g. Wilders Security Forums).


I'm interested solely in best practice approaches to dealing with bad blocks on hard disks.


I read potentially conflicting information. Examples:


… 7-way write (not just zero all, it does NOT do a reliable safe job mapping out bad blocks) …

https://discussions.apple.com/thread/1732448?answerId=8191915022#8191915022 (2008-09-29)


… In theory zero all might find weak or bad blocks but there are better tools …

https://discussions.apple.com/thread/2362269?answerId=11199777022#11199777022 (2010-03-09)


… substitution will happen on the first re-write with Zeroes. More passes just takes longer.

https://discussions.apple.com/thread/2507329?answerId=12414270022#12414270022 (2010-10-12)


For bad block purposes alone I can't imagine seven overwrites being any more effective than a single pass of zeros.


Please, can anyone elaborate?


Anecdotally, I did find that a Disk Utility single pass of zeros seemed to make good (good enough for a particular purpose) a disk that was previously unreliable (a disk drive that had been dropped).

Intel and PowerPC desktops and laptops, Intel and G4 Xserves, Xserve RAIDs, Mac OS X (10.6.7), Mac OS X 10.5.8 and 10.6.7, Mac OS X Server 10.5.8 and 10.6.7

Posted on Apr 21, 2011 12:22 PM

Reply
Question marked as Best reply

Posted on Apr 21, 2011 3:27 PM

Reformatting or overwriting a disk in an attempt to cause it to repair overt bad blocks is usually a waste of time and effort, in my experience.


By the time the disk is tossing enough visible errors (that can't be overcome using the device's EDC recovery and whatever RAID might be in use), the disk has usually failed; has exceeded its ability to apply whatever replacement sector scheme and error recovery it might possess. I'd replace it.


If the disk was effectively dealing with its inherent errors on read, then you should not be seeing these errors. With the read, you should either get the data back from the disk, or you get the data back with the assistance of the EDC, or you get an error and you get the data from elsewhere within the RAID, or you get an error and presumed-bogus data if the EDC and RAID cannot recover the data.


And rather than the whole-disk overwrite, an attempt to write a bad sector with a typical disk should automatically cause the disk to re-vector the write over to a spare, meaning you shouldn't need to do the wholesale overwrite.


If you're using the overwrite as a disk hardware test, that's another discussion. I'd probably use a different tool, and specifically targeted to disk verification, but certainly setting known patterns and then verifying them is entirely feasible.


To see what the disk thinks is going on with the errors, query the SMART data. There are a few canaries in SMART, particularly including scan errors, as well as the reallocation count, offline reallocation and probational count values. Those tend to point to an impending failure. The rest of SMART probably isn't as predictive, and SMART in general isn't all that good at predicting failures.


And a disk drive that had been dropped? I'd probably expect to swap it.

28 replies
Question marked as Best reply

Apr 21, 2011 3:27 PM in response to Graham Perrin

Reformatting or overwriting a disk in an attempt to cause it to repair overt bad blocks is usually a waste of time and effort, in my experience.


By the time the disk is tossing enough visible errors (that can't be overcome using the device's EDC recovery and whatever RAID might be in use), the disk has usually failed; has exceeded its ability to apply whatever replacement sector scheme and error recovery it might possess. I'd replace it.


If the disk was effectively dealing with its inherent errors on read, then you should not be seeing these errors. With the read, you should either get the data back from the disk, or you get the data back with the assistance of the EDC, or you get an error and you get the data from elsewhere within the RAID, or you get an error and presumed-bogus data if the EDC and RAID cannot recover the data.


And rather than the whole-disk overwrite, an attempt to write a bad sector with a typical disk should automatically cause the disk to re-vector the write over to a spare, meaning you shouldn't need to do the wholesale overwrite.


If you're using the overwrite as a disk hardware test, that's another discussion. I'd probably use a different tool, and specifically targeted to disk verification, but certainly setting known patterns and then verifying them is entirely feasible.


To see what the disk thinks is going on with the errors, query the SMART data. There are a few canaries in SMART, particularly including scan errors, as well as the reallocation count, offline reallocation and probational count values. Those tend to point to an impending failure. The rest of SMART probably isn't as predictive, and SMART in general isn't all that good at predicting failures.


And a disk drive that had been dropped? I'd probably expect to swap it.

Apr 24, 2011 8:39 AM in response to Graham Perrin

Reformatting or overwriting a disk in an attempt to cause it to repair overt bad blocks is usually a waste of time and effort, in my experience. Whether you run zero passes, one pass, a thousand or a billion passes, you (still) either have a disk with effective and functional bad block recovery (and sufficient spare blocks), or one that doesn't.


Multiple passes are an attempt to avoid data exposures through data remanence via overwriting any slightly off-track recording of data (and that's a specialized attack, and requiring specific hardware), and you've explicitly excluded that topic from the discussion here.


Put another way, this seems to be an unfortunate tangling together of data remanence and associated overwrite recommendations, and the typical bad block revector-on-write support, and RAID. All with what looks to be failing hardware. And given the reported industry average of three to six hard errors per terabyte, RAID is one of the few available mechanisms for recovery from failed blocks, for when you encounter a block error.


Run as many passes as you want. I'd swap the disk. By the time disks are overtly tossing multiple (visible) errors, they're not devices I would trust with my data, and not worth further use. (Why "overt"? The typical three to six hard errors usually don't show.)

Apr 25, 2011 9:37 AM in response to MrHoffman

Thanks for another helpful answer.


I should have stated at the outset that I'm thinking of any disk drive that does have (a) functional bad block recovery and (b) sufficient spare blocks.


(Giving an example of just one such disk seems to have caused confusion.)

… tangling together of data remanence and associated overwrite recommendations, and the typical bad block revector-on-write support, and RAID. …


I did, still do, half-suspect a tangling of some sort. https://discussions.apple.com/thread/1732448?answerId=8191915022#8191915022 is archived so I can't reply there FAO The hatter.


Maybe a better way of phrasing my original question … defocusing from Disk Utility, considering the secureErase verb of diskutil:


is a random fill pattern — or the first pass of any type of fill pattern associated with level 2, 3 or 4 — any more likely than a pattern of zeros to trigger spare block substitution?


The October 2010 post by Grant Bennet-Alder seems clear enough re: the effect of a pattern of zeros.


Still, I'd like to leave this question open/unanswered for a while, in case people have anything to add concerning fill patterns for spare block substitution purposes.

Nov 7, 2012 10:55 AM in response to Graham Perrin

Actually, despite all the noise in this thread, Disk utility is able to remove bad blocks from the usable pool on a drive when it is used to erase the a drive and write all zero's.


Backup your data first!!


Now, the point about "your hardware is failing" isn't necessarily wrong, but it isn't necessarily right either. If you have a situation where a drive has bad sectors, and it's not easy to replace it, using disk utility can help.


I have resurrected many hard drives and iPods this way, and MOST of them continue working indefinitely.


Good Luck,

Marty


PS Obviously in any mission critical application you replace the damaged drive ASAP.

Dec 15, 2012 7:59 AM in response to Graham Perrin

Go talk with the disk vendors. (If they're interested in discussing this, and that's not a certainty.) Or acquire data from a large provider, as Google and CMU did with their technical papers on this topic; here are links to some of the papers.


My comments (above) are from common practice within large and enterprise businesses. Dinking around with a comparatively cheap disk that's tossing visible errors is not an effective use of repair time, and risks reoccurrences.


Disks always toss errors. If they're working reasonably and the error is within the error detection and correction (EDC) available in the drivers and/or firmware, you'll get the data corrected and block transparently flagged for replacement on the next block write. If the error is outside what the EDC can recover, you're rolling in backups.


Can that disk that's tossing visible errors be functional? Sure.


Can the errors be isolated? Certainly.


Can pattern overwrites be used to force revectoring, or to demonstrate flaws in the EDC? Yes. Rolling in a full disk backup can have similar effects.


But my preference is to swap the disk. Preferably swapping for an equal or better grade of disk, as the cheap disks are usually cheap for a reason; shipped with fewer spare blocks, or with less reliable EDC, or other such. Disks are cheap. The cost of the repairs and data loss are more expensive.


Obviously: configure and maintain RAID or Time Machine or some other form of backup, if the data matters.

Feb 4, 2013 1:13 PM in response to Graham Perrin

@MrHoffman

As well pointed your answers are, you are not answering the original question, and regarding consumer device hard drives your answers are missleading.

Consumer device hard drives ONLY remap a bad sector on write. That means regardless how many spare capacity the drive has, it will NEVER remap the sector. That means you ALWAYS have a bad file containing a bad sector.

In other words YOU would throw away an otherwise fully functional drive. That might be reasonable in a big enterprise where it is cheaper to replace the drive and let the RAID system take care of it.

However on an iMac or MacBook (Pro) an ordinary user can not replace the drive himself, so on top of the drive costs he has to pay the repair bill (for a drive that likely STILL is in perfect shape, except for the one 'not yet' remaped bad block)

You simply miss the point that the drive can have still one million good reserve blocks, but will never remap the affected block in a particular email or particular song or particular calendar. So as soon as the file affected is READ the machine hangs, all other processes more or less hang at the same moment they try to perform I/O because the process trying to read the bad block is blocking in the kernal. This happens regardless how many free reserve blocks you have, as the bad block never gets reallocated, unless it is written to it. And your email program wont rewrite an email that is 4 years old for you ... because it is not programmed to realize a certain file needs to be rewritten to get rid of a bad block.


@Graham Perrin

You are similar stubborn in not realizing that your original question is awnsered.

A bad block gets remapped on write.

So obviously it happens at the first write.

How do you come to the strange idea that writing several times makes a difference? How do you come to the strange idea that the bytes you write make a difference? Suppose block 1234 is bad. And the blocks 100,000,000 to 100,000,999 are reserve blocks. When you write '********' to block 1234 the hard drive (firmware) will remap it to e.g. 100,000,101. All subsequent writes will go to the same NEW block. So why do you ask if doing it several times will 'improve' this? After all the awnsers here you should have realized: your question makes no sense as soon as you have understood how remapping works (is supposed to work). And no: it does not matter if you write a sequence od zeros, of '0's or of '1's or of 1s or of your social security number or just 'help me I'm hold prisoner in a software forum'.


I would try to find a software that finds which file is affected, then try to read the bad block until you in fact have read it (that works surprisngly often but may take any time from a few mins to hours) ... in other words you need a software that tries to read the file and copies it completely, so even the bad block is read (hopefully) successful. Then write the whole data to a new file and delete the old one (deleting will free the bad block and ar some later time something will be written there and cause a remap).


Writing zeros into the bad block basically only helps if you don't care that the affected file is corrupted afterwards. E.g. in case of a movie the player might crash after trying to display the affected area. E.g. if you know the affected file is a text file, it would make more sense to write a bunch of '-' signs, as they are readable while zero bytes are not (a text file is not supposed to contain zero bytes)


Hope that helped ;)

Feb 5, 2013 5:55 AM in response to Angelo Schneider

Thank you for your feedback, Angelo Schneider. Your reply refers to rewriting files or reloading data, and to the error recovery sequences which involve sector writes, which can via by rewriting the file or rolling in a copy of the file or through normal operations, and not to erasing or reformatting the drive.


On rereading my replies, I did indicate that; that reloading and rewriting and RAID can resolves issues with bad blocks through rewriting. As you correctly indicate, that's how sectors are revectored.


The original poster was specifically interested in using erasure and reformatting as a step for recovering from bad blocks, which isn't a step I view as being necessary or even useful, and — when folks are resorting to reformatting a drive for bad blocks — is usually an indication that the drive is long past replacement.


As for why the OP is referring to a multi-pass operation, some folks use that as a scrubber or as a way to expose additional failing blocks. In the absence of a disk diagnostic, overwrites are also sometimes used for disk qualifications, and to try to force a disk over into overt failure due to heating or other problems. (Not my preference, given I've had disks that don't read back the same data that was written; one had a two-byte "slip" in what was written, about two thirds through the disk.) Sometimes a RAID build or rebuild is used, too.


Yes, I do replace drives. Quite possibly earlier than some folks choose to. I just replaced a few disks yesterday, though those were well and truly bricked. As for costs of replacements, my data and the data of the folks I know and my time is worth more than the savings that I might accrue from successfully repairing a questionable disk drive.


Again, thanks.

Sep 29, 2013 4:42 PM in response to Graham Perrin

I take the following to mean that YES, 7-pass (or 3-pass) may find more bad blocks than a single pass, by the mere fact that some blocks may be, well, "iffy" and need more than one pass to be rejected. Please correct me if I am misinterpreting the following:

[I]f you're going to be committing important data to the drive, you may wish to... exercise the drive, by writing and reading data from as many locations as possible for as much time as you can spare.... [A]ny weak spot will show itself now instead of sometime down the road.


From:


Revive a Hard Drive for Use With Your Mac


Scanning for Bad Blocks

This next step will check every location of the drive and determine that each section can have data written to it, and the correct data read back. In the process of performing this step, the utilities we use will also mark any section that is unable to be written to or read from as a bad block. This prevents the drive from using these areas later....


When Disk Utility uses the Zero Out Data option, it will trigger the drive's built-in Spare Bad Blocks routine as part of the erasure process....


[I]f you're going to be committing important data to the drive, you may wish to run one more test. This is a drive stress test, sometimes referred to as a burn-in. The purpose is to exercise the drive, by writing and reading data from as many locations as possible for as much time as you can spare. The idea is that any weak spot will show itself now instead of sometime down the road.


There are a few ways to perform a stress test, but in all cases, we want the entire volume to be written to and read back.


Stress Test With Disk Utility

When Disk Utility uses the DOE-compliant 3-pass secure erase, it will write two passes of random data and then a single pass of a known data pattern. [Or 7-pass will write over data 7 times.] ... Once the erasure is complete, if Disk Utility shows no errors, you're ready to use the drive knowing it's in great shape.

[1]: http://macs.about.com/od/MacTroubleshootingTips/ss/Reviving-A-Hard-Drive-For-Use -With-Your-Mac.htm

Sep 29, 2013 5:56 PM in response to worldpoop

Write operations should trigger a block replacement, whether it's an overwrite or just writing application data.


"Rotating-rust" magnetic disks do usually either fail fairly quickly after installation and first use, or will eventially wear out after some amount of usage, so always have backups of your data. If your data is important enough, then RAID (which is not a backup) and have on-site and off-site backups of your data.


Reviving a questionable hard drive is... not a strategy I'd recommend. As I've commented up-thread. More than a few folks have tried that approach over the eons — even back when some half-gigabyte drives weighed 68 Kg and 19" wide and cost ~US$12000 and more — because — well, for any of various reasons — and I cannot recommend that approach.


Once a disk device flakes out, I replace it.


Particularly these days.


Disks are too cheap these days and the data on that disk is too valuable to mess around with a questionable disk device. Once a disk starts throwing visible bad blocks, it's usually on its way to failure. Sure. You might get lucky, and might have encountered an isolated failure. If not and if the disk is more generally failing, then you risk your data (again) and spend more time shuffling the data around.

Sep 29, 2013 11:06 PM in response to MrHoffman

Hello -Thank you for your response! These are excellent and informed recommendations (truly) about whether or not to revive drives versus purchasing new ones. Basically, in most cases one is better off off buying a new good hard drive. If you care about data, that's good advice.


Meanwhile, like the original poster above, I too have a situation where I seek the best way to unearth bad blocks on an existing hard drive. We are not alone in this big world I'm sure (though it may be lonely!) 🙂 The salient technical question re: manner of practice is this: Does multi-pass make a difference?


On the question of one vs. multi-pass rewrite erase, this author bases advice on an assumption that multi-pass serves as a "burn in" (also called a "stress test") to indeed increase detection of poor blocks. Does that assertion have merit -- it is technially accurate?


Thank you!

Disk Utility: for bad blocks on hard disks, are seven overwrites any more effective than a single pass of zeros?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.