Skip navigation

Ambiguous RAID failure

6130 Views 42 Replies Latest reply: Nov 6, 2012 11:28 AM by Samiam872 RSS
1 2 3 Previous Next
rrgomes Level 1 Level 1 (5 points)
Currently Being Moderated
May 18, 2012 8:39 AM

Early 2008 Mac Pro with Apple RAID card and 4 x 1TB drives installed.

 

I've had yet another RAID failure on my system (it's happened several times before) but this time the diaganostic is ambiguous and I need to make absolutely sure what's happened before I try to recover from it.

 

Overnight the RAID Utility log reported that Drive 3 had failed, that Raid Set RS1 was now degraded, and that there was no spare available for rebuild.

 

When I look now in RAID Utility at the status of the drives and of the array, all four drives show "green" (SMART: Verified and Status: Good), and Raid Set RS1 is "Viable (Degraded)".  But it shows the drives in Bays 2, 3, 4 as "Assigned" to Raid Set RS1, while the drive in Bay is not: it shows as "Roaming".

 

I'm fairly sure that one of the drives actually is problematic, because I've been having increasingly frequent episodes of freezing and non-responsiveness on the system (spinning beachball).  In the past couple of days it got so bad that it was difficult to do anything at all following a restart; the freeze/beachball happened very soon after.  I remember now that I had exactly this symptom in the past, just prior to a drive failure that RAID Utility reported.

 

So I guess I need to replace one of the drives, mark it as "Spare" in RAID Utility, and let the array rebuild.

 

But WHICH drive should I replace?  The log says that Drive 3 failed (I'm assuming that "Drive 3" is the drive in "Bay 3"), but now that drive shows as "good"--as do all four drives.  It's Drive 1 (i.e. the drive in Bay 1) that's been taken out of the array; drives 2, 3, and 4 are in Raid Set RS1.  Is that a red herring?  Is it possible that Drive 1 is bad even though the report was about Drive 3?  (Drive 1 is the only drive that has never been replaced at any time in the four years since I got this system.)

 

I think/fear that if I replace Drive 3, I'll blow away the array.

 

So it seems to me that I should rebuild the array by marking Drive 1 as spare (since it's the only drive that's unassigned), wait for it to complete, and then replace Drive 3 and rebuild again.  Or maybe I should just replace Drive 1 pre-emptively.

 

I don't know, but it takes a full 72 hours for the re-build to complete, a nerve-wracking time because throughout it the system is vulnerable to a second drive failure, so I would prefer not to have to do it multiple times.

 

Can someone please tell me in detail what the safest/most correct way is to proceed in order to recover from this?

 

Thanks.

Mac Pro (Early 2008), Mac OS X (10.7.4)
  • Grant Bennet-Alder Level 8 Level 8 (48,110 points)
    Currently Being Moderated
    May 18, 2012 9:10 AM (in response to rrgomes)

    Use Console utility and retrieve the EXACT TEXT of that message and post it.

    Mac Pro (Early 2009), Mac OS X (10.6.8), & Server, PPC, & AppleTalk Printers
  • Grant Bennet-Alder Level 8 Level 8 (48,110 points)
    Currently Being Moderated
    May 18, 2012 10:53 AM (in response to rrgomes)

    I see that there is a message with the word Drive and a 3 in it, but I think it is a huge assumption that any of those messages specifically calls out the drive in Bay 3.

     

    I would listen to the advice of RAID Utility, and replace the drive in Bay 1 if that is the one that is not running with the rest.

    Mac Pro (Early 2009), Mac OS X (10.6.8), & Server, PPC, & AppleTalk Printers
  • Grant Bennet-Alder Level 8 Level 8 (48,110 points)
    Currently Being Moderated
    May 18, 2012 4:35 PM (in response to rrgomes)

    For a Mac Pro, conventional wisdom is that the drives, even with the Apple logo on the label, are nothing special.

     

    For an iMac, you need a specific set of drives -- ones that are equipped with a calibrated Heat sensor and extra pins that the machine uses to read the drive temperature and use that temperature for fan speed control.

    Mac Pro (Early 2009), Mac OS X (10.6.8), & Server, PPC, & AppleTalk Printers
  • Grant Bennet-Alder Level 8 Level 8 (48,110 points)
    Currently Being Moderated
    May 19, 2012 3:58 PM (in response to rrgomes)

    It is also possible that it was slow due to marginal blocks on that drive, that required multiple re-trys to read good data.

    Mac Pro (Early 2009), Mac OS X (10.6.8), & Server, PPC, & AppleTalk Printers
  • Grant Bennet-Alder Level 8 Level 8 (48,110 points)
    Currently Being Moderated
    May 21, 2012 7:36 AM (in response to rrgomes)

    Use Activity Monitor to check memory usage:

     

    Activity Monitor: View system memory usage

     

    When you are confident in that information, especially that Pageouts are not killing your performance, change to this display:

     

    Runaway applications can shorten battery runtime

    Mac Pro (Early 2009), Mac OS X (10.6.8), & Server, PPC, & AppleTalk Printers
  • The hatter Level 9 Level 9 (58,535 points)
    Currently Being Moderated
    May 22, 2012 11:20 AM (in response to rrgomes)

    Do any 3rd party utiliities show spare blocks remaining vs used? or do background scan of media?

    Is Apple RAID card 'worth' the trouble etc is another question to consider.

    Have you thought of just software RAID? maybe SoftRAID 4.x which is a solid product and will scan in idle background to insure your drives do not experience I/O errors (even marginal ones, you set the threshold).

     

    With hardware RAID, if it is/was anything like with SCSI/SAS the drives all had the same revision model and firmware, people would buy spare drives at time of purchase to insure that they did have on hand extras later.

     

    If it feels like I/O or taking longer (time limited error recovery) try switching to WD RE series drives? scan for bad blocks. Rebuild the directory (is Disk Warrior 4.4+ 64-bit yet? and would it work properly?

     

    2TB RE drives (high density and I/O) in a 3-drive mirror (stripe reads using SoftRAID)

     

    4 x WD 10K using the new 1TB models (200MB/s) instead of 1TB Seagates....

     

    May seem like drastic surgery but would I think provide better support and performance.

  • Grant Bennet-Alder Level 8 Level 8 (48,110 points)
    Currently Being Moderated
    May 22, 2012 11:21 AM (in response to rrgomes)

    If you are suspicious that Bad Blocks is the problem, third-party Utilities can do a live "Scan for Bad Blocks". Tech Tool Pro and Drive Genius come to mind.

     

    Another way to do this is to get a spare drive like the others (don't you need one on hand anyway?), add it to the set as a Hot Spare, and kick out a suspect drive (or power off and remove one) to get it to rebuild onto the Hot Spare.

     

    On the now surplus drive, perform an Erase with Security Erase Option: "Zero all Data"  (One pass). This takes several hours to complete, but forces the drive to substitute spares from its private pool to replace any found to be Bad after Zeroing. Then that drive can become the new Hot Spare, and continue the process until all have been "laundered".

     

    Occasionally, a drive with good SMART Status returns "Initialization Failed" caused by more than 10 Bad Blocks. The only proper response at that juncture is to throw your arms in the air and scream, "YES! I knew it!"

    Mac Pro (Early 2009), Mac OS X (10.6.8), & Server, PPC, & AppleTalk Printers
  • The hatter Level 9 Level 9 (58,535 points)
    Currently Being Moderated
    May 22, 2012 11:45 AM (in response to Grant Bennet-Alder)

    TTP will scan but it isn't any use, not to me and others, when it doesn't map out or even tell you what the block's sector ID is (use a tool to manually map out sector).

     

    What did work and people miss from OS 9's Drive Setup -

    http://support.apple.com/kb/TA21976

     

    Fix random lengthy pauses in OS X by correcting bad blocks ...

     

    Those are Seagate drives use their own Seatools - burn ISO to CD, or use WD Lifeguard the same way which I know does map them out and cure ills of disk drive diseases.

1 2 3 Previous Next

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.