Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Level 2

261 points

Disk Utility: for bad blocks on hard disks, are seven overwrites any more effective than a single pass of zeros?

In this topic I'm not interested in security or data remanence (for such things we can turn to e.g. Wilders Security Forums).

I'm interested solely in best practice approaches to dealing with bad blocks on hard disks.

I read potentially conflicting information. Examples:

… 7-way write (not just zero all, it does NOT do a reliable safe job mapping out bad blocks) …

— https://discussions.apple.com/thread/1732448?answerId=8191915022#8191915022 (2008-09-29)

… In theory zero all might find weak or bad blocks but there are better tools …

— https://discussions.apple.com/thread/2362269?answerId=11199777022#11199777022 (2010-03-09)

… substitution will happen on the first re-write with Zeroes. More passes just takes longer.

— https://discussions.apple.com/thread/2507329?answerId=12414270022#12414270022 (2010-10-12)

For bad block purposes alone I can't imagine seven overwrites being any more effective than a single pass of zeros.

Please, can anyone elaborate?

Anecdotally, I did find that a Disk Utility single pass of zeros seemed to make good (good enough for a particular purpose) a disk that was previously unreliable (a disk drive that had been dropped).

Intel and PowerPC desktops and laptops, Intel and G4 Xserves, Xserve RAIDs, Mac OS X (10.6.7), Mac OS X 10.5.8 and 10.6.7, Mac OS X Server 10.5.8 and 10.6.7

Posted on Apr 21, 2011 12:22 PM

Best reply

MrHoffman

Community+ 2024

Level 10

116,757 points

Posted on Apr 21, 2011 3:27 PM

Reformatting or overwriting a disk in an attempt to cause it to repair overt bad blocks is usually a waste of time and effort, in my experience.

By the time the disk is tossing enough visible errors (that can't be overcome using the device's EDC recovery and whatever RAID might be in use), the disk has usually failed; has exceeded its ability to apply whatever replacement sector scheme and error recovery it might possess. I'd replace it.

If the disk was effectively dealing with its inherent errors on read, then you should not be seeing these errors. With the read, you should either get the data back from the disk, or you get the data back with the assistance of the EDC, or you get an error and you get the data from elsewhere within the RAID, or you get an error and presumed-bogus data if the EDC and RAID cannot recover the data.

And rather than the whole-disk overwrite, an attempt to write a bad sector with a typical disk should automatically cause the disk to re-vector the write over to a spare, meaning you shouldn't need to do the wholesale overwrite.

If you're using the overwrite as a disk hardware test, that's another discussion. I'd probably use a different tool, and specifically targeted to disk verification, but certainly setting known patterns and then verifying them is entirely feasible.

To see what the disk thinks is going on with the errors, query the SMART data. There are a few canaries in SMART, particularly including scan errors, as well as the reallocation count, offline reallocation and probational count values. Those tend to point to an impending failure. The rest of SMART probably isn't as predictive, and SMART in general isn't all that good at predicting failures.

And a disk drive that had been dropped? I'd probably expect to swap it.

View in context

28 replies

cm24

Level 1

0 points

Nov 7, 2013 1:46 PM in response to Graham Perrin

The problem is:

Depending on the disc, the number of bad blocks can increase very fast.

That is not always the case, but I have seen this occasionally.

Usually I use the badblocks command on Linux to test hard drives.

The nice thing is that you see the block numbers that failed,

which gives you a good feeling what is going on.

Furthermore you see how it slows down when blocks are still readable but if it requires

multiple spins of the disc to read the block.

If a disc fails I would not trust it any more - see below:

When performing multiple scans I found that blocks that could be read on the first pass

but which are close to blocks that failed on the first pass have an increased likelihood to fail on the second pass.

I had more than one disk for which this never converged after a reasonable number of passes,

i.e. on which the bad part increased steadily in size.

I also had a disc on which the number of bad blocks remained unchanged after the first pass.

But still I would not trust this disc.

If you have important data I would not put it on a bad disc.

If your data is not important, I would consider to delete it.

Nov 18, 2013 12:41 PM in response to Graham Perrin

Hi Graham:

There are several things at play here. First, security, the second is mapping out a bad block. Don't confuse the two.

If a contemporary HD has a bad sector in it, one of the "tricks" used to force it to be mapped out is to use the security feature of Disk Utility or other tools to write zeros to the bad block. When the write fails, the controller in the HD remaps the sector to a spare sector, if there are any left. Doing this seven times won't make a difference. It's either going to re-map or it isn't.

I would suggest the 7 pass information you're getting is either from someone that's confusing security with block re-mapping, or they're thinking of old ST-506 based MFM (circa 1980s) HDs. In the latter, the old MFM drives were expensive ($300 for a 10MB drive NOTE: that's an "M", not a "G") and it was worth making an attempt at repairing them. The technique was to go to that bad sector and over write it many times iin hopes of "clearing" it, and if not put the sector into a list of bad sectors. This is really, really old stuff. No one does that anymore. I believe one of the tools that did this was Spinrite. I don't remember for sure. With age comes fog.😉

I recently had to go through a lot of computers that we were originally considering re-selling, but ended up selling or giving them away to staff:

https://discussions.apple.com/thread/5385790?answerId=23798487022#23798487022

I ended up getting Scannerz because it could scan the drives faster than anything else and it could be used for more advanced troubleshooting. In one of the sections of the manual they tell you how to re-map the bad blocks, and they tell you to do it using Disk Utility by zeroing it. They say nothing about doing this 7 times. If a bad block will re-map it will do it on the first try.They also give hints to imply you might be better off getting a new drive.

From a cost standpoint, attempting to fix a hard drive probably isn't worth it anymore. If I have a technician that's getting paid $20/hour and you figure in overhead costs that wage goes up a lot. If I had one of these guys go into troubleshooting stuff like that, the costs could likely run into hundreds of dollars, for what is now likely an inexpensive part. I can get a 250GB new HD here in the U.S. for just over $40. Why spend more money trying to repair one that may end up being unrepairable if you can just get a newer, likely even better drive?

I've seen the "zeroing" trick work. If there are spare sectors and the bad blocks are limited, they will remap. However, Google did a study on HD failures that implies once a drive has detected errors, more are likely to follow. I don't have a link for it, but it's on the web. IIRC the life expectancy of a drive that's showing errors is 6 months or less.

Apr 19, 2014 4:11 PM in response to Graham Perrin

Many of the products pitched in this thread aren't needed if you're confident in your ability to use a shell. Most hard drives that would be flagged by the recommendations in this thread will also be perfectly good hardware.

Read errors are a common result of cutting power or highly unstable AC power. Desktop (non-enterprise) drives often don't handle either of these very well. A brown out or hard power cut can give a read error that is safely handled with a single overwrite.

To check drive status, you can use the smartctl utility to check your drive's SMART status. There are many free builds available, including in MacPorts. If you see 197 and 198 errors in an equal count and everything else is within tolerance, your next step should be to try finding what files have a problem. You want to try overwriting these.

To find the files, use fsck_hfs to perform a block scan. This can take hours to complete, but you can do it on a live system. Look at man fsck_hfs for the -S option.

To overwrite the files, move the affected files to your trash, enable the Finder option to securely erase files (this performs an overwrite), then check whether the 197 and 198 errors are now zero level.

If fsck_hfs didn't find files that account for all blocks, your next step is to use Disk Utility to zero out free space on your drive, then check smartctl again.

If your only errors were 197 and 198 type, and if they're all now zero, you're done. No need to replace your drive unless this becomes a common occurrence, and unless you're certain you're providing steady power to your Mac and aren't powering down hard.

Apr 20, 2014 3:26 AM in response to McGroarty

Many people are NOT familiar with booting into single user mode, running fsck, or perhaps running the command line version of smartctl, a smartmontools subset. People buy third party tools to make life easy on themselves. I also have questions about your methodology and results.

First off, asking people to boot into single user, command line mode to run fsck is a bit abnormal. Rather 1970's- ish is how I'd put it.

Second, fsck checks the file system integrity, not necessarily bad blocks. The gist of this thread is about media failures, not B-Tree indexing problems. fsck variants only check for correlation between what's contained in the index files and that which is stored on disk. It does not check for any bad blocks in media regions that are not used by the drive.

Third, smartctl and smartmontools are well beyond the scope of most users. There are a few SMART monitoring tools on the market that haven't been mentioned here that are really nothing more than smartctl interpreters. They likely exec the smartcl application via an NSTask instance, obtain the response from smartctl, and then graphically map it out for users. It makes it easier for users to understand. I have no qualm with people doing this as long as they make it clear that what they are doing is interpreting an open source product. As an FYI, I'm reasonably certain that Scannerz uses the same SMART monitoring that Apple does, and TechTool Pro and Drive Genius seem to have their own regiment. It doesn't matter, really, as long as they report it fairly, IMHO.

Fourth, SMART monitoring has been heavily criticized because it only detects problems after the fact. For example, Scannerz, TechTool Pro, and Drive Genius all scan the media of the entire drive at a low level, regardless whether the drive has indexing problems or not and regardless of whether data is present. If any of these detect problems with a sector or a block , they will flag it, whereas SMART will never even be aware bad blocks until a write failure is detected. I would recommend you look up SMART monitoring on Wikipedia and also take a look at the Google study about its reliablity. There are many reports on the web where SMART has indicated a drive is about to fail at any instance, only to have the drive last for years, or on the other extreme, cases where a drive fell flat on its face with no SMART indicators indicating immenent failure. SMART technology is inconsistently implemented from manufacturer to manufacturer.

Fifth, I think it's bad advice to essentially imply to people that a drive with potential problems is "OK." A drive with problems is a drive with problems. If all anyone has stored on it is games or other trivial data, so what? But what about people that have tax returns, critical financial data, or other important and critical information stored on their drives? What you're proposing is essentially is for people to essentially take a "crap shoot" with what might be exceptionally important information. What you should be emphasizing is backups.

Finally, I like it when I see "fans" of a product make it aware to me, I'm greatful. I don't care if they're just "fans" or perhaps diabolical marketing conspirators, as long as they're not obnoxious about it. Please keep in mind that on many general computer websites there are often outbreaks of Mac vs. PC, and I seriously doubt that there's a barrage of Apple or Microsoft employees participating in them. A fan or satisified user is likely nothing more than that.

Apr 20, 2014 7:39 AM in response to ThomasB2010

ThomasB2010 wrote:

Many people are NOT familiar with booting into single user mode, running fsck, or perhaps running the command line version of smartctl, a smartmontools subset. People buy third party tools to make life easy on themselves. I also have questions about your methodology and results.

As I said, my reply was for people who are comfortable with the shell. If people are more comfortable with a packaged solution, I expect they would have stopped with that caveat. However, some of us are more comfortable using tools that have been provided by the OS vendor, or which have matured over decades.

ThomasB2010 wrote:

First off, asking people to boot into single user, command line mode to run fsck is a bit abnormal. Rather 1970's- ish is how I'd put it.

Newer is not always better. Not all of us trust a lone dev or two to come up with something that somehow trumps the decades of experience and domain knowledge of OS developers and storage professionals, and to then paint a nice GUI over it as the cherry on top.

While I haven't explored this space much on the Mac, the Windows drive utility marketplace is full of snake oil that does nothing meaningful, or even does more harm than good.

ThomasB2010 wrote:

Second, fsck checks the file system integrity, not necessarily bad blocks. The gist of this thread is about media failures, not B-Tree indexing problems. fsck variants only check for correlation between what's contained in the index files and that which is stored on disk. It does not check for any bad blocks in media regions that are not used by the drive.

[...]

You are wrong.

As I said in my reply, look at the "-S" option. fsck_hfs with the -S option performs a complete surface scan and not only identifies bad blocks, but which files they're associated with (if any). It also identifies bad blocks that are not in use. I neither state or imply that SMART is sufficient, except in learning more about an error a user has already encountered. Such an error was the subject of this thread.

ThomasB2010 wrote:

Fifth, I think it's bad advice to essentially imply to people that a drive with potential problems is "OK." A drive with problems is a drive with problems. If all anyone has stored on it is games or other trivial data, so what? But what about people that have tax returns, critical financial data, or other important and critical information stored on their drives? What you're proposing is essentially is for people to essentially take a "crap shoot" with what might be exceptionally important information.

In my reply, I explain that 197 and 198 errors are a frequent result of interrupted or unstable power during a write operation. This is particularly prevelant on the types of low end drives included in some Mac models. I explain this condition, and I also explain that if the 197 and 198 errors are not rectified by overwrites, if they return repeatedly, or if SMART reports a different type of intolerance, the drive is not okay.

Pulling power is not likely to damage a drive, but it is guaranteed to interrupt a write in progress and that often causes incomplete block writes. Backing up a drive and restoring to a new drive is not a risk-free operation. To throw away every drive that fails to checksum a sector is to throw away a vast majority of good hardware and to take unnecessary additional risk.

ThomasB2010 wrote:

What you should be emphasizing is backups.

Yes, emphasizing backups is common sense. Hopefully it's something everybody here is already doing. It's also beyond the scope of the reply.

But sure. Please remember to exercise and take a multivitamin too.

Apr 20, 2014 12:19 PM in response to McGroarty

I'm the one that reported using Scannerz (or maybe I should say "pitched" it by mentioning it). I'm personally reasonably well versed in fsck and its family. I'm also well versed in observing people that think they know how to use it end up trashing an entire file system.

The main difference between Scannerz and fsck is that Scannerz doesn't assume an I/O error is a bad block. Faulty cables and other problems can generate the exact same error. If the I/O error was caused by a faulty cable, fsck could conceivably perform unnecessary operations on a perfectly good disk. Scannerz would detect the difference. fsck was never intended to be a test tool, but that's all Scannerz does.

If you have a faulty cable that's generating intermittent and/or erratic I/O errors because of periodic disconnects, fsck would be seeing these as drive faults and modifying the file system when it shouldn't be modified. What you will end up with is a potentially ruined file system.

If someone wants to test the integrity of a hard drive or system, the last thing they need is something that's going to start modifying the file system in that process.

sheffi

Level 1

17 points

Mar 11, 2016 3:04 PM in response to MrHoffman

I have 94 bad blocks ,I think it is the previous 10.8 file vault data can not be arased because the SMARTis verified and all tests are positive ,but calculations are bad 160 GB reads 149gb.Ispent a week erasing 10 times at 35 write over takes 2 days in between formatting and partitioning on different numbers using Techtool pro and deluxe disk utility taking out memory I did every known test and repair Apple said that it should be gone ,but it is encrypted and the writer can not read so can not write over.

sheffi

Level 1

17 points

Mar 11, 2016 3:52 PM in response to sheffi

The 10.6 is 8GB so installed it goes to 19GB Why ? there is no reason given Computers are supposed to calculate ,I tell you ,2016 thousands of years millions of people study and get educated and trillions of books and Zillions of Terabytes of info yet we are still dumb of progress.

sheffi

Level 1

17 points

Mar 11, 2016 4:05 PM in response to sheffi

I am sorry to say Apple but 13 years of struggle with 30 computers and 50 apple products and lots of money Ihave a mixed success and never done my job and not made bi lingual user interface no Ainu support no Siberian languages support no written explanation in dictionary of all code written to all OS X and iOS many things I thought could be possible with a PC all those false promises that Computing could solve all our problems.

sheffi

Level 1

17 points

Mar 11, 2016 4:09 PM in response to sheffi

Why is American flag ,and Wifi even though it is masked on my account open shown on top of menu bar when I use Install disk to use Disk utility ,it seems like the computer is being controlled by remote.

BobHarris

Level 9

53,147 points

Mar 11, 2016 6:01 PM in response to sheffi

If the volume is encrypted, you just destroy the keys, and forget the data on the disk because with out the keys it is just a bunch of random bits.

If you insist on erasing, then 1 pass of zeros is good enough because even if someone can recover the random bits, without the encryption key there is nothing they can do with those random bits.

All you ever have to do with a FileVault encrypted disk, is ask Disk Utility to reformat the drive putting a new empty file system on it. That will erase the encryption keys, and if you do not have a separate copy, then once the keys are gone the encrypted bits on that drive are useless to everyone.

I am sorry to say Apple...

Apple is not reading this forum. It is just for User-to-User problem solving. If you wish to communicate with Apple, try the Feedback page

<http://www.apple.com/feedback/macosx.html>

Tony T1

Level 6

10,232 points

Mar 11, 2016 6:20 PM in response to Graham Perrin

Years ago I had a drive with a few bad blocks.

One pass was enough to have the blocks marked as bad.

The drive then enjoyed a long life until the day came that I needed a bigger drive (the drive never failed)

Edit: Just noticed that this thread is a 5 years zombie 🙂

tedtv

Level 1

10 points

Aug 28, 2016 8:27 AM in response to Graham Perrin

Last post on this was March 2016. Hopefully I get an answer. But I just bought SoftRAID for my Mid 2010 Mac Pro running El Capitan, four internal SATA, and an external JBOD SansDigital RAID box with Sonnetch Tempos Duo. I'm curious about using the SoftRAID Write Zeros to first and last Sectors function. I see many saying it is a waste of time. I'm a video editor so backup is very important. I've been replacing drives the past month and I will replace these with the bad sectors since SoftRAID is recommending they be removed but I'm still curious about the Zeros to Disk function and if SoftRAID has a good reputation as software that is good at repairing drives vs. the Mac Disk Utility. Also, if I engage the Zeros to Disk function will it totally erase the drive or just write to those sectors leaving all my other data alone? On a side note, knocking on wood, these Hitachi, and Samsung, internal drives have served me well. And well past the average 3 year life span of drives. Serious work horses.

Disk Utility: for bad blocks on hard disks, are seven overwrites any more effective than a single pass of zeros?