Skip navigation

Software (OSX) Concatenated Drive (JBOD) Issues

770 Views 11 Replies Latest reply: Jan 30, 2013 11:37 PM by Seijo RSS
Seijo Calculating status...
Currently Being Moderated
Nov 25, 2012 12:36 AM

We have 3 JBOD drives set-up... one as a shared network drive (12TB), and two 8TB units one of which is the primary drive, and the other a backup to that drive and which also holds the individual system backups for all the networked computers.  The reason for this is that this configuration is to deal with and encode RAW video, HD and otherwise, and we need ALL the files to be instantly accessible and searchable (with basic metadata), and spotlight is such a POS, that it has been unusable since it's inception.  I don't need lectures on spotlight or the use/build of JBODs/RAIDs, so if, like on many of the searches I made here, that is your  goal... please don't.

 

At issue, is that for the first time in 3 years, we are now experiencing problems with the primary JBOD drive.  Unfortunately, there is no software that comes with OSX, or in about 15 different backup and recoveries softwares that allows any form of information retrieval or maintenance on such drives... no SMART info is available from any of the 12 individual SATA drives involved, no software we have sees the drives individually, and most don't even recognize the drives as a single JBOD.  Most softwares only see a drive if it is mounted on the desktop, and they only see it  as it is presented via the OSX system (so CCC and Disk Warrior see them sufficiently for their job, but iPartition fails miserably in any form, and it and TechTool are horrifyingly dangerous to operate with a JBOD mounted).

This means we have no way of determining which of the 4 drives in the primary set is failing, only that as it is doing so, it is leaving bad sectors which are being ignored by both the system and any hardware on the actual drive... our only notification that this was happening was that when the file was accessed (the actual contents read, not just the headers and TOCs), the resulting I/O error force unmounted the drive from the system, claiming we had disconnected it.  This occurred during two backup attempts, which led us to the diagnosis... as soon as we removed the files from the backup process, the I/O error and forced dismount stopped.  This was then confirmed by just attempting to read the file without any actual loading... quickly confirming it when the drive dismounted halfway through the read (but as the indicator lights on three of th drives were blinking, there was no way to figure which one).

We then attempted to use three different programs to locate the affected sectors (using a sector scan, which is supposed to locate AND mark the sectors). But, as scanning 8TB would take a day via eSATA (god forbid firewire or USB), when we got back each time, after leaving the various softwares running overnight (or over the weekend), we discovered none of them survived when they finally came across the sector(s) affected.  When the drive got dismounted, every one of the softwares crashed out (IIRC one caused OSX or the GUI to hang, and a cold reboot was required).  Thank the Lord for log files, even though they aren't very detailed.

 

Also note that the main system is a G5 quad running OSX 10.4.11, under hardware and software necessity, and that, even though we have 10.5, it is only used during times when other work is not on-going (typically during only about 16 hours on Sundays). So software CANNOT be MacIntel, or require OSX over 10.5.

 

So, with this knowledge, the questions are:

1. Is there a software package that can recognize and locate bad sectors, without crashing/unceremoniously dismounting the drive?

2. Is there a software package that can locate the exact position of the data of a file (without actually reading the data itself), across a JBOD or other RAID configuration (drive, tracks, sectors)?  Is this available via a terminal/unix command?

3. Is there recovery software that can access the files on the good drives in the set? (don't need this now, but might need it in the future)

4. Is there a general drive software that is actually aware of Apple's software RAID set-ups, and is configured to deal with them?

5. Is there any way to retrieve SMART data from drives across eSATA or a chipset that doesn't normally do so in a drive enclosure(s)?

 

Thanks

Mac OS X (10.4.11), Several incl. G5-quad, G4 AGPs, G3
  • BobHarris Level 6 Level 6 (12,520 points)
    Currently Being Moderated
    Nov 25, 2012 4:40 AM (in response to Seijo)

    If you have a PC available, you could try SpinRight from GRC.com. You would put the suspect drives in the PC one at a time and run SpinRight on them. If the drive and it's data can be recovered it will. However once you can get your data you should replace the bad drives.

  • etresoft Level 7 Level 7 (23,915 points)
    Currently Being Moderated
    Nov 25, 2012 6:25 AM (in response to Seijo)

    I don't know the answer to any of your questions. Being a Mac user, I don't want to deal with low-level issues like that. Using a Mac allows me to focus on the big picture. I will let the Linux people worry about disk sectors.

     

    What's wrong with Spotlight? I think that would have been the better question to address before you got to this point. If there is something going wrong with your Mac, it is best to investigate and solve the problem before you put 18 TB of data on it.

     

    I see now that you have issued a pre-emptive plea to ignore the big picture and focus just on the disk sectors. Sorry, no can do. You have said that your drive is failing, but is not yet dead. This means that you still have a chance to save most of your data. Stop trying to look for the failure. JBOD is just a neat trick. You aren't supposed to use it. Get yourself a real RAID array and move your data to it. Do that now. I will issue my own pre-emptive plea to avoid any petty arguments on a discussion forum. It is your data that is being corrupted here, not mine. You can have a real RAID array in a few days. In Boston, someone local might be able to do it. Then your problem is solved - gone. If you have that much data, you might also want to look into a small HSM. You have gone past consumer equipment here.

  • etresoft Level 7 Level 7 (23,915 points)
    Currently Being Moderated
    Dec 1, 2012 8:15 PM (in response to Seijo)

    Seijo wrote:

     

    Unbelievable! You are worse than useless, and a condescending twit, as well!

    And you are insulting someone who is trying to help you save your data before it is too late.

     

    You clearly didn't read this, and you provided nothing but your senseless and ******** derision... here's something prime you ignored:

    "I don't need lectures on spotlight or the use/build of JBODs/RAIDs, so if, like on many of the searches I made here, that is your  goal... please don't."

    If you wipe the froth from your mouth and re-read my reply, you will see that I did read your plea and decided it was more important to help you save your data than help you fiddle with disk sectors while your files were being lost forever.

     

    1. If you don't know the answer to any of the questions, *** did you even post for?

    To help you save your data.

     

    2. If you only want to work on the big picture, why are you even looking at low-level help requests?

    Because such low-level help requests are almost always misguided. People go down a rabbit hole and get stuck on minutiae and miss easy, higher-level solutions.

     

    3. Why do only Linux people have to worry about sector issues?

    Because that's what they do with their spare time.

     

    4. What's wrong with spotlight? There's over 20,000 posts here telling you why!  Go read them!

    This is a user-to-user tech support forum. Most post in this discussion forum are about problems. Usually those problems are created by the user from misconfiguration. Sometimes they are hardware failures. Your problem seems like some of both. In any event, the forums are a poor way to judge the relative quality of any particular technology. Anything used as much as Apple devices is bound to have many reports of problems. In the big picture of hundreds of millions of users, thousands of such problem reports are insignificant.

     

    6. We've had over 8TB running constantly under this system for over 3 years, there is nothing wrong with it.

    If there is nothing wrong with it then why are you posting here?

     

    You obviously don't have a life, and you are a total @$$.  Your setup and beliefs about computer use are apparently your own, and have no basis in reality for thousands of us out here.

    You're welcome. Enjoy what files you have left.

  • Christopher Murphy Level 2 Level 2 (470 points)
    Currently Being Moderated
    Dec 2, 2012 3:06 PM (in response to Seijo)

    What you're looking for is smartmontools. That's the set of tools that includes smartctl for polling the SMART information off ATA disks. So long as the drives are directly connected to the motherboard or to a PCI controller, it's easy to use. If the drives are in an enclosure of any type, then it gets a lot trickier because most of the bridge chipsets fail to pass through the SMART command set.

     

    Getting smartmontools on OS X is not easy. If you're familiar with Macports you can get it from there, but of course they provide source, not binaries so you have to have xcode to compile it for 10.4 or 10.5. I have a DMG/PKG installer of smartmontools ppc for 10.5. I doubt it will work on 10.4.

     

    The least invasive way to get access to smartmontools, requiring no installation, is to download a linux distribution for PPC. I'd vote for Fedora 17 because I know it contains smartmontools already, at least for i386 and x86_64, and I don't see why it wouldn't include smartmontools, but even if it doesn't it's straightforward to install it, even though your computer is booted with a CD you can install software (the installed software goes away every time you reboot as it's installed in RAM) a huge plus of a LiveCD. I can't tell you if you need the ppc64 or 32-bit version however. I think for the G5 the 64-bit is what you want.

     

    In any case, smartctl needs to be pointed to the actual physical disk. An array is presented to the user by the OS as a sort of logical volume. That logical volume is made up of multiple physical disks. If you are on OS X you use:

     

    diskutil list


    You will see the individual disks and you will see the array, each will have different /dev/diskX designations.

     

    From a linux LiveCD you'd get your listing of disks using:

     

    parted -l

     

    That's a lower case L. And the designation there is /dev/sdX. You have to run smartctl on each individual physical disk making up the concat array. I do this both on OS X on mounted volumes, as well as on mounted or unmounted volumes in linux. So that part doesn't matter. It's a read only command in any case. The basic command to get attributes on the disk and find out which one has problems is:

     

    smartctl -a /dev/diskX

    or

    smartctl -a /dev/sdX

     

    The first is OS X's disk designation and the 2nd is linux. OS X uses numbers for devices, e.g. disk0, disk1; and linux uses letters, e.g. sda, sdb, etc.

     

    But now that I've written all of this, the bad news is I don't think you can remove a disk within an OS X software linear array, unless it's a RAID 1+linear nested array. As far as I know, there is no command like LVM's pvmove to migrate the data from the bad disk to a new disk, whether the array is online or offline. I think your only choice is going to be to backup, find the bad drive, replace it and recreate the array from scratch, then restore. This is a LOT easier to deal with on linux using LVM and pvmove for this exact situation.

     

    BTW, for DAS and video you're better off with RAID 0 than linear/concat. At least with RAID 0 you get scalable performance, whereas with linear/concat your performance is limited to that of a single disk.

  • Christopher Murphy Level 2 Level 2 (470 points)
    Currently Being Moderated
    Dec 2, 2012 3:30 PM (in response to Seijo)

    Aha, so now I see that this is in an external enclosure. You have to know what chipset it is, then you have to read the exhaustive documentation for smartctl to find out what parameters to use to get through the chipset, assuming it's even possible. You're better off pulling the drives from the enclosure, direct connecting them to the SATA port on the motherboard, and polling the drives individually that way.

     

    Also, I think your storage expectations are flawed. You're talking enterprise storage expectations, but the hardware you've got isn't enterprise. The way to do it correctly is direct connect SATA to a SATA PCI card, no chipset in between. Or you build a NAS and run smartctl/smartd on the NAS to directly poll the drives. Enclosure bridge chipsets almost universally are junk. So expecting any utility to automagically deal with junk is not a good expectation. It isn't the fault of Disk Utility when bridge chipsets don't pass through all ATA commands, including SMART commands, to/from the drives.

  • Christopher Murphy Level 2 Level 2 (470 points)
    Currently Being Moderated
    Dec 2, 2012 3:45 PM (in response to Seijo)

    Wait wait wait. I should be smarter than this. If this is a Disk Utility software concat RAID, then obviously it sees individual disks. And if you're getting a URE on a bad sector, then dmesg will report the device and sector of the failed read. So you should reproduce the read failure by reading the file causing the problem, and then go to terminal and type

    sudo dmesg

    And now sift through that for a read error. You could even try:

    sudo dmesg | grep error

    And see if any lines show up. Other variations are possible too:

    sudo dmesg | grep sector

    sudo dmesg | grep bad

    sudo dmesg | grep read

     

    Although that's just a guess. I'm not sure how XNU reports bad sectors. But in any case, you can match the device it complains about getting an error from with the results from Apple System Profiler. There you will find the BSD disk name as disk0, disk1, disk2, and also what the serial number is. And you can then correlate the disk with bad sectors with a disk serial number and then remove the offending disk.

  • Christopher Murphy Level 2 Level 2 (470 points)

    Another thing to do is to simply overwrite the offending file with a known good copy. This will cause the file system to want to replace those sectors. When the disk attempts to write to the bad sector, if there is a persistent write failure, then the disk firmware will remove that sector from use and replace it with a reserve sector.

     

    Bad sectors are somewhat normal and not inherently a reason for replacing a drive. But only looking at the full SMART attribute data is it possible to determine if there are bigger problems going on with a drive than just a few bad sectors that need to be written to, to force reallocation.

     

    Consumer SATA disks really should be zero'd on a regular basis (once a year or two), or ideally issue the ATA Secure Erase command (which can be done with a linux liveCD with the hdparm command) which is quite a bit faster than writing zeros, and can be done to multiple disks at once. The firmware itself is doing the erasure so bus/controller bandwidth is a non-factor.

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.