Apple Event: May 7th at 7 am PT

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Software RAID Failure - my experience and solution

I just wanted to share this information with the iCloud community.


I searched a bit and did not find much information that was useful with regard to my software RAID issue.


I have 27 inch Mid 2011 iMac with SSD and Hard drive which has been great.


I added an external hard drive (I think if I mention any brand name the moderator will delete this post) which includes an nice aluminum case with two 3 TB hard drives within it, and it has a big blue light on the front and is connected via Thunderbolt. This unit is about 2 years old and I have it configured in a 3 TB mirrored RAID (RAID 1) via a software RAID configured via Mac OS Disk Utility.


I had at one point a minor glitch which was fixed using another piece of software (again if I mention a brand the moderator will delete this post) which is like a 'Harddrive Fighter' or similar type name LOL. So otherwise that RAID has served me well as a site for my Time Machine back up and Aperture Vault, etc. (I created a 1.5 TB Sparse bundle for Time Machine so that the backup would not use the entire 3 TBs)


I recently purchased a second aluminum block of drives, and set that up as a 4 TB RAID 1.

Each of the two RAIDs are set with the option of “Automatically rebuild RAID mirror sets” checked.


I put only about 400 gb on the new RAID to let it sit for a ‘burning in period.’


A few days ago the monitoring software from the vendor who sells the aluminum block of drives told me I had a problem. One of the drives had “Failed.” The monitoring software strangely enough does not distinguish the drives so you can figure out which pair had the issue, so I assumed it was the New 8 TB model. Long story short, it was the older 6 TB model, but that does not matter for this discussion.


I contacted the vender and this is part of their response.


“This is an indication that the Disk Utility application in Mac had a momentary problem communicating with the drive mechanism. As a result, it marked that drive as "failed" in the header information. Unfortunately, once this designation is applied to a drive by the OS, the Disk Utility will thereafter refuse to attempt any further operations with that disk until the incorrect "failed" marker is manually cleared off the drive.”


That did not sound very good to me…..back up killed by a SOFTWARE GLITCH?

“The solution is to remove the corrupted volume header, and allow the generation of a new one….This command will need to be done for each disk in the array… (using Terminal)…


diskutil zerodisk (identifier)


…3. After everything is finished, you should be able to exit Terminal, and go back into the Disk Utility Application to re-configure the RAID array on the device.”



Furthermore they said.


“If the Disk Utility has placed a flag into the RAID array header (which exists on both drives) then performing this procedure on a single drive will not correct anything.”


And…


“When a drive actually does fail, it typically stops appearing in the Disk Utility application altogether. In that circumstance, it will never be marked "failed" by the Disk Utility, so the header erase operation is not needed.”


This all sounded like a bad idea to me. And what does the Vendors RAID monitor software say then? “Disk Really Really FAILED, check for a fire.”

As I tried to figure out which drive was actually the bad RAID pair I stumbled on a solution.


First I noted that the OS Disk Utility did NOT show a fault in the RAID. It listed both RAIDS as “Online.’ Thus no rebuilding was needed and it did not begin the rebuild process.


The Vendors disk monitor software saw some fault, but Mac was still able to read and write to the RAID, both disks in the mirror. I wrote a folder to the RAID and with various rebooting steps I pulled the “Bad” drive and looked at the “Good” Drive….the folder was there…I put the Bad drive back in and pulled the Good Drive and the folder was there on the “bad” drive. So it wrote to both drives. AND THE VENDORS MONITORING SOFTWARE SHOWED THE PREVIOUSLY LABELED ‘BAD’ DRIVE AS ‘GOOD’ AND THE MISSING DRIVE SLOT AS ‘BAD’.


My stumbled FIX. I moved a bunch of files off the failed RAID to the new RAID but before I moved the sparse bundle, a folder of 500 gigs movies and some other really big folders the DISK UTILITY WINDOW (which I still had open) now showed that the RAID had a Defect and began rebuilding the mirror set itself, out of the blue! I don't know why this happened. But moving about 1/2 of the data off of it perhaps did something? Any Ideas?


This process took a few hours as best I can tell (let it run overnight) and the next day the RAID was fine and the Vendors RAID monitor did not show a fault any longer.


So, the Vendors RAID monitoring software reporting a “FAILED” drive without any specific error codes to look up. Perhaps they could have more info for the user on the specific fault? The support line of the the Vendor said with certainty “the Volume Header is corrupted” and THE ONLY FIX is to completely ZERO THE DRIVE! This was not necessary as it turns out.


And the stick in the eye to me…..


“I've also sometimes seen the drives get marked as "failed" by the disk utility due to a shaky connection. In some cases, swapping the ends of the Thunderbolt cable will help with this. Something to try, perhaps, if your problems come back. “


Ya Right…..


Mike

iMac, Mac OS X (10.7.2), 27

Posted on May 29, 2014 3:35 PM

Reply
6 replies

Jun 27, 2014 4:21 AM in response to Aronis

IIt's failed again twice more.


THe a Disk Utility is labeling the first disk of the RAID as failed. I rebuilt the RAID set again and error disappeared.


LaCie support says it could be caused by


1. Daisy Chaining the drives

2. The thunderbolt cable

3. The iMac

4. An act of God

5. The Windows 7 machine across the other side of the room

6. My 1Dx and photo software

7. The Shelf Elf sitting in his box awaiting Christmas time

8. Oh forgot, bad power supply plugged into the RAID box.

9. But not a bad hard drive. No can't be that.


So I'm doing the zeroing thing which is a Microsoft type solution.


"Microsoft Support, can I help you?"

"My windows computer won't start"

"First we will have you do f disk and wipe the hard drive."


etc

Jun 27, 2014 7:18 AM in response to Aronis

You mentioned Lacie. Most of their products use software RAID which as you have found is not 100% reliable. Apart from one other alternative software RAID solution (see below) you might next time want to consider an external drive which has hardware RAID (see also below).


Software RAID - http://www.softraid.com

Western Digital - http://www.wdc.com/en/products/products.aspx?id=630


In theory it should not happen i.e. the software should be designed to cope, but a possible cause of your problems would be if one drive was slower to start up than the other. When the Mac tries to use it, it might then only see 'half' the RAID and mark it as bad. This might be due to using different models of drive, or one perhaps starting to show early signs of hardware failure and taking longer to spin up or move or calibrate the heads.

Jun 27, 2014 11:14 AM in response to John Lockwood

Thank you for the feedback.


I had great results with the first LaCie 2 big 6 tb model set as 3t Mirrored RAID for about 2 years.


The error showed up shortly after I added a second LaCie 2Big 8 tb model set as 4 Tb mirrored RAID. They were daisy chained.

The LaCie monitor showed the fault. The support wanted me to wipe both drives by zeroing out each, but the Disk Utility rebuild the mirror spontaneously and the error disappeared.


I did some investigating when it first gave the error. I removed the "good" drive from the physical box and restarted. Mac showed the remaining drive as healthy and all the data was there. Then I plugged the "Good" drive back in and removed the "Bad" drive and again the data was intact on the other mirror.


So both drives appeared to be intact. The Disk Utility just rebuilt the mirror by over writing the drive which had been reported as "Degraded."


I now have the two drives plugged into the two different thunderbolt jacks on the back of the mac.

I'm going to do the ZERO thing as they claim that will prove the actual hard drive is mechanically functioning correctly.


Mike


I had hardware RAID on my windows machine (prior build I used since 2000) and the RAID controller died...LOL The rest of the motherboard was fine. It was an Asus. The data was gone in the stripped RAID but my mirror drives were fine. Now I have a new Windows rig with windows 7 and several mirror RAID drives. LOL....and one set for stripping (game software). I actually thought I was buying a hardware RAID when I got the first LaCie.

Jul 3, 2014 6:30 PM in response to Aronis

Follow up.


After going through the Zeroing process and rebuilding the RAID set three times, with various configurations, LaCie finally agreed to repair the unit under warrantee.


I tried swapping the power supplies and thunderbolt wires, tried taking the drive out of series with the newer big brother of it. And it still failed after a few days.


I just wanted to share more of what I learned with regard to rebuilding the RAID sets via the Terminal. The commands can be typed partially and a help paragraph will come up to give VERY cryptic descriptions of the proper use of the commands.


First Under terminal you can used the command "diskutil appleRAID list" to list those drives which are in the RAID. This gives you the ID number for each physical drive. For example:


AppleRAID sets (1 found)

===============================================================================

Name: LaCie RAID 3TB

Unique ID: 84A93ADF-A7CA-4E5A-B8AE-8B4A8A6960CA

Type: Mirror

Status: Online

Size: 3.0 TB (3000248991744 Bytes)

Rebuild: manual

Device Node: disk4

-------------------------------------------------------------------------------

# DevNode UUID Status Size

-------------------------------------------------------------------------------

0 disk3s2 D53F6A81-89F1-4FB3-86A9-8808006683C2 Online 3000248991744

- disk2s2 E58CA8F5-1D2C-423A-B4BE-FBAA80F85879 Spare 3000248991744

===============================================================================



In my situation with the failed RAID, I had an extra disk in this with the status of Missing/Failed.

The command is "diskutil appleRAID remove" and the cryptic help paragraph says:


Usage: diskutil appleRAID remove MemberDeviceName|MemberUUID

RAIDSetVolumePath|RAIDSetDeviceName|RAIDSetUUID


MemberDeviceName|MemberUUID is the number listed in the "diskutil appleRAID List" command, and

RAIDSetVolumePath|RAIDSetDeviceName|RAIDSetUUID is the Device Node for the RAID which here is /dev/disk4.

I used this command to remove the third entry (missing/failed), I did not copy the terminal window text on that one, so I cannot show the list of three disks.

I could not get to remove the disk2s2 disk listed as SPARE, as it gave an error message:

Michaels-iMac:~ mike_aronis$ diskutil appleraid remove E58CA8F5-1D2C-423A-B4BE-FBAA80F85879 /dev/disk4

Started RAID operation on disk4 LaCie RAID 3TB

Removing disk from RAID

Changing the disk type

Can't resize the file system on the disk "disk2s2"

Error: -69827: The partition cannot be resized


But I was able to remove it using the graphical interface Disk Utility program using the delete key.

I then rebuilt the RAID set by dragging the second drive back into the RAID set.


I could not get the command: "diskutil appleRAID update AutoRebuild 1 /dev/disk4" to work, because even though it was trying to execute it HUNG. I put the two drives into my newer LaCie 2big as my attempt at further trouble shooting the RAID (this was not suggested by LaCie tech), rebuild the RAID and now I am going to leave it setup that way for a few days before I ship it back to just see if the old drives work fine in the new RAID box (thus proving the RAID box is the problem). I tried the AutoRebuild 1 command just now and it gave an error.


Michaels-iMac:~ mike_aronis$ diskutil appleraid update autorebuild 1 /dev/disk4

Error updating RAID: Couldn't modify RAID (-69848)

Michaels-iMac:~ mike_aronis$


In my haste to rebuild the RAID set for the third or forth time as LaCie led me through the testing this and test that phase, I forgot to click the "Auto Rebuild" option in the Disk Utility program.


Question for the more experienced:


As I was working on this issue, I notice that each time I rebooted and did work in the Terminal (with and without the RAID plugged in to the thunderbolt connection) I notice that the list of drives would change and my main boot drive would not stay listed as drive 0! Some times it would be drive 0, sometimes the RAID would be listed as Drive 0. It's strange to me...I would have thought the designation for Drive0 and Drive1 would always be my two build in drives (SSD and spinning drive).


Mike



Jul 6, 2014 5:50 AM in response to Aronis

I've proven the LaCie box is the issue after placing the two old 3 tb disks in the new LaCie box and rebuilding the RAID set. It was stable for three days.


I went ahead and swapped out a drive to see just how one would go about replacing a dead drive with a brand new one.


The Disk Utility listed the missing drive as Missing/Failed and the new drive was added to the RAID by formatting it and then dragging it to the RAID set in the Disk Utility GUI. This resulted in a diskutil applerraid list showing the new drive as SPARE. After some trial and error it appears you need to first remove the Missing disk form the list with diskutil appleraid remove command. Then if you reboot (or perhaps wait an hour) disk utility will begin rebuilding the Mirror with the new drive. You can trigger this via the terminal as well.


hopefully some one will benefit from my learning process in the future..


LOL>


Mike

Aug 12, 2014 5:39 AM in response to Aronis

I found this article which explains in more detail the timeout that John suggested.


http://wdc.custhelp.com/app/answers/detail/a_id/10633/~/reconnecting-a-my-book-t hunderbolt-duo-causes-the-raid-to-rebuild-itself


The timeout could be the issue you are facing. Perhaps the fix will help. My RAID has been solid for years until this timeout happened to me, out of the blue.

Software RAID Failure - my experience and solution

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.