Apple Event: May 7th at 7 am PT

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Ambiguous RAID failure

Early 2008 Mac Pro with Apple RAID card and 4 x 1TB drives installed.


I've had yet another RAID failure on my system (it's happened several times before) but this time the diaganostic is ambiguous and I need to make absolutely sure what's happened before I try to recover from it.


Overnight the RAID Utility log reported that Drive 3 had failed, that Raid Set RS1 was now degraded, and that there was no spare available for rebuild.


When I look now in RAID Utility at the status of the drives and of the array, all four drives show "green" (SMART: Verified and Status: Good), and Raid Set RS1 is "Viable (Degraded)". But it shows the drives in Bays 2, 3, 4 as "Assigned" to Raid Set RS1, while the drive in Bay is not: it shows as "Roaming".


I'm fairly sure that one of the drives actually is problematic, because I've been having increasingly frequent episodes of freezing and non-responsiveness on the system (spinning beachball). In the past couple of days it got so bad that it was difficult to do anything at all following a restart; the freeze/beachball happened very soon after. I remember now that I had exactly this symptom in the past, just prior to a drive failure that RAID Utility reported.


So I guess I need to replace one of the drives, mark it as "Spare" in RAID Utility, and let the array rebuild.


But WHICH drive should I replace? The log says that Drive 3 failed (I'm assuming that "Drive 3" is the drive in "Bay 3"), but now that drive shows as "good"--as do all four drives. It's Drive 1 (i.e. the drive in Bay 1) that's been taken out of the array; drives 2, 3, and 4 are in Raid Set RS1. Is that a red herring? Is it possible that Drive 1 is bad even though the report was about Drive 3? (Drive 1 is the only drive that has never been replaced at any time in the four years since I got this system.)


I think/fear that if I replace Drive 3, I'll blow away the array.


So it seems to me that I should rebuild the array by marking Drive 1 as spare (since it's the only drive that's unassigned), wait for it to complete, and then replace Drive 3 and rebuild again. Or maybe I should just replace Drive 1 pre-emptively.


I don't know, but it takes a full 72 hours for the re-build to complete, a nerve-wracking time because throughout it the system is vulnerable to a second drive failure, so I would prefer not to have to do it multiple times.


Can someone please tell me in detail what the safest/most correct way is to proceed in order to recover from this?


Thanks.

Mac Pro (Early 2008), Mac OS X (10.7.4)

Posted on May 18, 2012 8:39 AM

Reply
42 replies

May 24, 2012 7:50 PM in response to rrgomes

From the booted backup the RAID rebuild continued for a while, with errors reported by RAID Utility and by CCC, with which I was trying to make a second bootable copy.


Abruptly, RAID Utility reported that a disk had been removed and reinserted (in real life, no such thing happened). Log says it was the Drive in Bay 2 (why didn't it tell me this before?).


But now RAID Utility says that the Raid Set consisting of drives 1, 3, and 4, is "Not Viable".


I think this means that the long struggle is over, that the volume is truly dead and not recoverable except by restoring from the backup that I have. Is that correct? Is there any point in replacing Drive 2 with another 1TB and trying to rebuild?


The most infuriating thing about this is that it was probably this drive (drive 2) that was the problem all along, but there was no indication of it. If I had known which drive to replace I wouldn't have ended up replacing good drives and ending up in a two-drive-failure situation.


Add me to the list of people who are unhappy with the Apple RAID Card. Next time around I'll choose a different RAID solution.


I'll proceed with the restore as soon as I have the new drives in hand (a day or two).

May 26, 2012 9:41 AM in response to The hatter

Got four new matched 1TB drives (decided I didn't want to increase the volume size since that complicates backup).


Installed them, booted from the backup, but now the system doesn't see the drive in Bay 1, though I've removed and reinstalled it several times.


Not something I can correct myself. I guess it's off the Apple Store for repair. (Dollar bill with wings.)


Apple doesn't need an OS to be installed, right? (Because there's none on the internal drives now.) I can just describe the issue and leave it with them, I presume.

May 26, 2012 1:42 PM in response to rrgomes

The system changed its mind; now it can see all four drives. So, apart from being nervous about the state of the hardware, I guess I can proceed with the restore.


There's one thing I'm unsure of: I don't think that restoring it the way I'm planning to (create a new RAID-5 volume, then use CCC or SD to copy the bootable backup to the new volume) will recreate the Lion Recovery Mode partition. Is there an easy way to do that apart from going through a re-installation of Lion?

May 26, 2012 3:33 PM in response to rrgomes

I really do not see the value in putting the system on RAID5. Better to leave that to a standalone drive though don't know where and how or what options.


CCC as I said will clone Lion Recovery, if you had made a clone of that (only deals with volumes, unless used to do a full sector copy drive2drive which isn't possible).


Might be best time to really start from scratch.


There is CalDigit S3/6G PCIe card that is bootable and includes 2-SATA3 ports and 2-USB3 ports as well,, $139 from your OWC vendor. Usually people want it for an SSD to get the most out of it. Throw the system on a Samsung 830 120GB $130 in the process, or two and go for soft stripe array even. (SoftRAID 4 added support for SSDs which I trust over Apple's implementation too).

May 26, 2012 9:39 PM in response to rrgomes

Dear All:


I am a user of Apple RAID for few years already but my knowledge of RAID is quite limited. Thanks to all of you as I read through the entire thread of this message. I learned a lot about RAID and various issues. I am interested in this message because I had exactly the same problem now with my RAID setup as RRGOMES.


My configuration is as follows:


Mac Pro (early 2009 model) 2 x 2.66GHx Quad-Core, 16GB DDR3 1066Mhz Memory

Apple RAID Card (HW Ver 2.00 and FW Ver E-1.3.20)


1 x 1TB HD (Apple original) in Bay 1 as boot disk configured as Enhanced JBOD

3 x 2TB HD (WD black series RE) in Bay 2 to Bay 4 configured as RAID 5


Mac OS 10.7.4


Yesterday after I woke up my Mac Pro from sleep mode, the RAID Utility launched itself and showed the following messages:



Degraded RAID set RS2 - No spare available for rebuild
Degraded RAID set RS2 - No spare available for rebuild
Degraded RAID set RS2
Drive 3:50014ee20352f5e8 missing - Previous drive status was inuse
Drive 3:50014ee20352f5e8 failure detected - Primary disk port unusable, previous drive status was inuse


Furthermore, RAID Utility shows:


1) Bay 1 (RS1 boot drive): Assigned, verified, Status good. viable (good)

2) Bay 2 (RS2 RAID 5): Assigned, verified, Status good, viable (degraded)

3) Bay 3 (RS2 RAID 5): Roaming, verified, Status good

4) Bay 4 (RS2 RAID 5): Assigned, verified, Status good, viable (degraded)


It also showed "Severe Events" message: "Degraded RAID set RS2 - No spare available for rebuild."


Here are my questions and comments:


1) Comment to RRGOMES: I used SuperDuper to clone my boot disk to an external disk and it has been working beautifully for me all along. I can even boot from the external backup and run it as my main disk in case I need to perform any maintenance on my boot disk such as disk defrag, etc. CCC will probably do similar or same as SuperDuper but I have no experience with it.


I have the Disk Warrior 4, Drive Genius 3, and TechTool Pro 6. All of them are good as disk maintenance tools but Disk Warrior 4 and Drive Genius 3 are the most used ones. If you need only one, either Disk Warrior or Drive Genius are good choice. (Just to declare that I am not affliated to any of these vendors. Just from a user standpoint.) These tools come in handy when you need to do some emergency rescue. Drive Genius is good that it keeps tracks of your disks status and warn you of any disk failure symptom before it gets worse. However none of these goodies apply to RAID sets diagnostic unfortunately.


2) My questions is that it seemed odd that why all of a sudden, RRGOMES and I and may be others having this Degraded Problem? I don't think it is of coincidence. Our problem are identical even though our hardware is not the same. I initially had 3 x 1TB disks as my RAID 5 when I bought my Mac Pro (mid 2009) but then I replaced all of them last year (Aug 2011) with above WD 2TB disks. So it is not even one year and both disks failed at the same time! The RAID set has been working very well until yesterday when I woke up my Mac Pro and I was surprised by the RAID Utility message. There was no warning whatsoever.


I agreed with RRGNOMES that in the RAID message it is not clear as to which drives actually failed.


3) I tried to understand more of above message from RAID Utility, it states Drive 3.....drive status was in use. This is strange as which one is Drive 3? It can be interpreted as "Bay 3" which is my "Disk 2 in Bay 3" for my 3-disk RAID set. Or can it be "Disk 3 in Bay 4"? Very confusing.


Nevertheless the message states drive "status in use". What does that mean at all.


I have a gut feel (feel only and no hard evidence) that this degraded RAID set is not really due to disk failures!! How can it be so conincidental that some of us are having same problem!


WD black series RE drives are very solid drives (enterprise class drives). Two failures at the same time is very rare though it could happen but with only less than one year of usage? What I will do here is to take each one of those drives and run a zero write once and then do a full scan on each drive for bad blocks to see if they are problem drives. It will take a while to do this but I will report my findings later.


4) I suspected it may have something to do with some recent OS update issues instead of hardware issues? Will anyone share your thoughts?


5) I tried to replace one of the drive with a brand new disk (Bay 3 or Drive 2 in my RS2) but it did nothing. There was nothing happen. I also from the RAID menu to click on "make spare" to this disk. It still do nothing? I thought if I swap the disk, it will automatically starts the rebuild. But the "Task" of Controller Section (on the left hand panel of RAID Uty) showed 0 tasks. How do I get it to rebuild my RAID set?


4) Any suggestions how I salvage this RAID set other than start a new RAID set completely? Thanks.

May 27, 2012 7:53 AM in response to The hatter

Thanks for all your help. For the moment all I want/need to do is get the system back to its previous state--single easily-backed up RAID-5 volume--minus the horrible performance issues (likely due to the failing disk drive) so I'm going to recreate the original RAID-5 volume onto the new drives using the Apple RAID Card and restore my bootable backup to it.


From what I've read I should also be able to recreate the Lion Recovery volume by reinstalling Lion (i.e. re-downloading it from the App Store and re-running it) so that should be covered as well.


However, you've convinced me that I can do better than what I have. Perhaps, even though this system is four years old, it's still possible to improve its performance without having to spend a lot of money that would be better spent on a newer Mac system later this year or next.


I like the ostensible protection and performance of RAID-5 (even if not Apple's implementation of it). Are you really saying that a software RAID solution using something like SoftRAID could outperform that?


I'd like to discuss alternatives (including a RAID card from another vendor, and/or using an SSD for the system). Is that appropriately done here, or in a separate thread?

May 27, 2012 10:51 AM in response to rrgomes

Think we've kicked this dead horse into the ground enough.


Truthfully, I'd try to sell you on external RAID6 for hardware array box.

Software (SoftRAID) for inside.

PCIe SSD card(s) for high speed I/O

MacGurus for their options and forum

https://www.macgurus.com/forums/index.php


www.macperformanceguide.com

for setup and upgrade options - he recently went with PCIe SSDs and 4 x 4TB Hitachi using half each drive to achive as much I/O as a Mac Pro can get on native 4 SATA drive ports.


I would reload your data, just not to the Apple RAID controller array.


And close this thread.

Jul 19, 2012 7:13 PM in response to The hatter

Just wanted to follow up on this. It took me a while to get all the pieces together. But after restoring to the Apple RAID with four new 1 TB drives, only to see the same problems return, I saw reason and gave up on the Apple RAID card entirely.


Removing the card was a pain because it was connected to a cable with very little slack that had to be reconnected to a port on the motherboard. But I did so, installed 2 TB drives (that I already owned) in the four drive bays, used SoftRAID to make a 3 TB RAID-0 stripe using the fastest half of three of those drives, and installed an OWC Mercury Accelsior 480 GB PCIe SSD. I use the Accelsior as a boot drive and the RAID-0 stripe as a data volume. The other 2 TB drive is used to make a bootable backup clone.


I've also ordered the CalDigit USB 3.0/eSATA card, though it hasn't arrived yet. The system already has a 2-port eSATA card so this isn't really needed but it will help with backups and such.


It's only been a few days so it may be too early to declare success, but so far the system is behaving better than it ever has. If it holds up then I can probably wait a little longer before getting a newer system. Perhaps a new Mac Pro in 2013 if Apple finally does release a new one.

Nov 6, 2012 11:28 AM in response to rrgomes

I found that if you use the raidutil commandline comand 'list driveinfo' it will spesify what bay has the issue. I have had this exact problem 4x now. and am very thankful to read all the different ideas posted here.


raidutil list driveinfo

Drives Raidset Size Flags

-----------------------------------------------------------------

Bay #1 RS1 2.00TB IsMemberOfRAIDSet:RS1 IsReliable

Bay #2 RS1 2.00TB IsMemberOfRAIDSet:RS1 IsReliable

Bay #3 <none> 2.00TB IsReliable IsNotAssigned IsRoaming



I have rebuilt my raid 4x. And I am not creating another thread because I am not seeking help but think my findings are relivent to this thread. I hope that its appropreate?


Overview:

In the 4 times my servers Raid Utility booted sporting 'Degraded RAID set RS1 - No spare available for rebuild' I have replaced all drives, swapped cradle locations, replaced the raidcard battery, and done a little research. Thus I have eliminated the drives as culprate and the cradles. Clearly I have not learned nor adopted a better system. I plan to go over this thread again and research another option before this occers again. I forget who's idea it was that this problem might be software and not hardwere. But note below that I have totally different hardwere and softwere. So, if its softwere than this is happening all over again.


hardwere:

Model Identifier: Xserve2,1

Processor Name: Quad-Core Intel Xeon

Processor Speed: 2.8 GHz

Number Of Processors: 2

Total Number Of Cores: 8

Xserve RAID Card:

Hardware Version: 1.00

Firmware Version: M-2.0.5.5

Expansion ROM Version: 0018

Shutdown Status: Normal shutdown

Write Cache Enabled: Yes

Battery Info:

Firmware Revision: 1.0.2

First Installed: 12/29/07 4:05 PM

Last Date Conditioned: 10/2/12 11:35 PM

State: Working battery

Fault: Normal battery operation

Status:

Charging: Yes

Conditioning: No

Connected: Yes

Discharging: No

Sufficient Charge: Yes


softwere:

System Version: Mac OS X Server 10.6.8 (10K549)


Issue:

With my Xserver I have a different raid card than the pro, 3 drives configured in Raid 5. I have had drive 1 and 3 fail. Both 2x. not consistent with cradle swaping. First using 3 1T drives that came with the server. After the 2nd time it failed I swapped them out for 3 2T Hitachi HDS722020ALA330 and started fresh. Since then its failed 2 more times. Each time I fallow the same rebuild steps below.


Raid Utility Log:

Monday, November 5, 2012 4:56:23 PM ET Degraded RAID set RS1 - No spare available for rebuild critical

Monday, November 5, 2012 3:59:01 PM ET Degraded RAID set RS1 - No spare available for rebuild critical

Monday, October 22, 2012 1:41:40 PM ET Degraded RAID set RS1 warning

Monday, October 22, 2012 1:41:38 PM ET Drive 3:5000cca221c077dc missing - Previous drive status was inuse critical

Monday, October 22, 2012 1:41:38 PM ET Drive 3:5000cca221c077dc failure detected - Primary disk port unusable, previous drive status was inuse critical


Steps to rebuild raid:

bash-3.2# raidutil list driveinfo

Drives Raidset Size Flags

-----------------------------------------------------------------

Bay #1 RS1 2.00TB IsMemberOfRAIDSet:RS1 IsReliable

Bay #2 RS1 2.00TB IsMemberOfRAIDSet:RS1 IsReliable

Bay #3 <none> 2.00TB IsReliable IsNotAssigned IsRoaming


bash-3.2# raidutil -v modify drive -s -d 3

Modifying drives... complete.


bash-3.2# raidutil -v modify drive -A -d 3

Modifying drives... complete.


bash-3.2# raidutil list taskinfo

Tasks Status

------------------------------------------

Rebuild (RS1) 0% complete

bash-3.2#


Next steps:

I may start with an OS upgrade as its about time. But I suspect this will have no effect on the problem considering you have had it with 10.7.4. Fallowing that, I am going to look for a PCI card and 6mbs uplink per drive enclosure. Any help here is much apprecteated.

Ambiguous RAID failure

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.