Previous 1 2 3 4 5 Next 233 Replies Latest reply: Mar 9, 2015 12:02 PM by Jyri Palm Go to original post
  • etresoft Level 7 Level 7 (26,675 points)

    If you are having a SMART failure, your drive is dead. Get a new one.

  • dburr Level 1 Level 1 (15 points)

    etresoft wrote:

     

    If you are having a SMART failure, your drive is dead. Get a new one.

     

    No SMART failure, in fact no drive failure whatsoever.  As I stated in my original message I ran the drive through a battery of tests and did not find a single bad block.  It looks like SMARTreporter is mistakenly interpreting the sqlite error message as an indicator of drive failure.

  • etresoft Level 7 Level 7 (26,675 points)

    Or perhaps both SQLite and SMARTreporter are noticing the same I/O failure.

  • Nathaniel Madura Level 1 Level 1 (10 points)

    I am also seeing this problem on a "Relatively new" (< 1 yr old) Mac Pro system running 10.7.4 Server. I have searched into RAM problems (swapping RAM in/out), disk problems, reapplying combo updaters, yet, the problem persists.

     

    I have even hashed of all the libs and compared them to a different machine, and all the hashes match!

     

    It is just the sym links that are being corrupted. You can execute the following to see what sym links aren't working...

     

     

    $ sudo find -L . -type l -ls

     

     

    I get results like the following (trimmed for readability)

     

    lrwxr-xr-x /System/Library/Frameworks/SystemConfiguration.framework/Resources -> ?Ν?ze??f???d!?????????   
    lrwxr-xr-x /System/Library/Frameworks/SystemConfiguration.framework/SystemConfiguration -> X?Ϟ????Ȣġ|?:?ԡ??ũ.??ޅ܅??   
    lrwxr-xr-x /System/Library/Frameworks/SystemConfiguration.framework/Versions/Current -> ? 

     

    If I fix a sym link it seems to get corrupted (again) in several days.

     

    Message was edited by: Nathaniel Madura FWIW I am also on a Software RAID 0+1

  • Smeevil Level 1 Level 1 (0 points)

    Hi,

     

    It seems that we (barttenbrinke and Me) isolated the problem.

    It has absolutely to do with using a form of RAID and the boot disk being a part of that same RAID.

     

    What we did to isolate is :

    Have the core / base system on one partition. Create a second partition (on the same RAID) and there we put all the mutable stuff like /Applications /var /Development (xcode) etc.

     

    This will not "solve" the problem but will prevent it from happening to critical stuff.

    The corruption has not happened at all on the second partition.

  • ktwalker69 Level 1 Level 1 (0 points)

    Nathaniel, that is a great tool for helping to collect data about this problem.  Thanks for that.

     

    Smeevil, my troublesome system has a single, bootable disk that is *not* part of a RAID.  [Also, Jmanis above had this problem without RAID or JBOD].  My system does have a 12 TB JBOD where I keep data files, comprising three 4 TB Hitachi disks.  My problems used to be with symlinks "on the" bootable disk pointing to stuff on the JBOD.  However, recently this system was wiped and started over (3rd time).  It is now being managed by our IT department.  The problem has reoccurred on the bootable disk pointing to stuff on the bootable disk.

     

    In an attempt to summarize everything above:

     

    Disks were replaced.  Problem came back.

    Different disks (in terms of manufacturer and size) see same problem.

    RAM was replacedl.  Problem came back.

    Disks/partitions were wiped.  Came back.

    Occurs on "Mid 2010 Mac Pro" and "Mid 2008 iMac" and "Relatively new (< 1 yr old)" Mac Pro.

    Disks were partitioned differently:

      - boot disk not part of RAID: problem exists

      - boot disk part of RAID: problem exists

      - non-bootable disks: problem does not exist

    Snow Leopard: problem does not exist

    Lion and Mountain Lion: problem exists

    Problem is not easily reproducible and may take days to weeks for a fixed symlink to get recorrupted.

    Problem occurs on some machines, but not others running the same OS and JBOD/partioning structure.

     

    Speak up if I got something wrong, and I'll edit it.

     

    Just ran

     

    sudo find -L / -type l -ls | grep '?'

    and

    sudo find -L / -type l -ls | grep '@'

     

    on 4 machines that also have 10.7.4 installed with a single drive having the boot partition and a larger JBOD RAID partition containg random data files.  I did *not* find this problem on any other machine unfortunately.  It is possible that there was an update on the other machines that I did not apply (to keep them from slowing down).

     

    Just for full disclosure, this is a "Mac Pro Mid 2010" Quad-Core intel Xeon using 4 Hitachi 4 TB disks and 32 GB RAM.

     

    Does anyone know (how about it ertsoft?) if the Inode structure is supposed to be updated routinely via some system plist?  If so, what processes are involved?

     

    I don't think this is going to get solved on this forum.  Has anyone submitted a proper bug report?  I think the only way this is going to get solved is for an Apple engineering team to get access to a computer in which the problem is occurring consistently, which I think requires a bug report submission.  My producting system is working at the moment, but others are not.  As mentioned, this is becoming an expensive problem for some people/groups.

     

    Kris

  • dburr Level 1 Level 1 (15 points)

    Smeevil wrote:

     

    Hi,

     

    It seems that we (barttenbrinke and Me) isolated the problem.

    It has absolutely to do with using a form of RAID and the boot disk being a part of that same RAID.

     

    Unfortunately that pattern does not fit my situation.  I am indeed using a RAID (Apple's software RAID 0) but my boot disk is NOT part of the RAID (it's on its own drive).

  • etresoft Level 7 Level 7 (26,675 points)

    As twtwtw said above, symbolic links are just text files with special attributes. The only way for this type of corruption to occur is the disk blocks containing the symbolic link data to be overwritten with other data. I don't see how that kind of corruption could happen without corrupting any other type of file. The problem would have to lie deep inside the underlying RAID structure if a relatively higher level disk repair doesn't see the corruption.

     

    The best you can do is verify the the symbolic links are good and as soon as you detect that they aren't, drop everything, save your log files, and submit a bug report.

  • Jmanis Level 1 Level 1 (10 points)

    Bumping this post to say that this problem  appears to have gone away for me although I'm not doing the disk intense work I was doing before.  At the time I was experiencing the issue I was doing some compiling and forensic investigations that involved a lot of disk activity. Lately I've only been using the iMac for more routine things and haven't had the problem. I'm curious if Apple quietly fixed this in one of their patches.

     

    Has anyone else on the thread seen the issue go away or are you still suffering? 

     

    BTW- Here's another way to check for broken symlinks:

    for f in $(find / -type l); do if [ ! -e "$f" ]; then echo "$f"; fi; done 2>1 | grep -v Permission


  • Smeevil Level 1 Level 1 (0 points)

    We solved the problem by attaching a 2GB FireWire 800 disk to the server which is now its boot disk.

    The original 4 1TB drives are in a Raid0+1 and we do not see any corruption at all anymore.

     

    We do use symlinks on the boot disk to point to directories in the raid and they do not corrupt either.

    I hope this info will help anyone having the same problems....

  • dburr Level 1 Level 1 (15 points)

    Not going to say that the problem has gone away for sure, lest I jinx myself, but I haven't noticed this happening in the recent past.  Since 10.8.2, possibly 10.8.1.  My usage pattern hasn't changed (if anything it has been a bit more intensive than usual - just released two new apps and updates to a few other apps).  So maybe they did quietly release a fix to this in one of the recent OS X updates.  Let's hope so!  *crosses fingers and toes*

  • Iosepho Level 1 Level 1 (0 points)

    It has NOTHING to do with Raid. Absolutely nothing. It might have to do with having multiple drives in the system though!

     

    I have experienced it (in JAVA) on a perfectly usual late 2009 iMac (though with a replaced, 3GB HDD) running Mountain Lion.

    I reinstalled Java (good luck doing THAT if you're anything less than a complete UNIX geek), now it seems to be working. I did quite a lot of data hauling before the problem came up, copying stuff from backup drives through USB and Firewire 800.

     

    The hard drive is brand new, so I got a little scared that it might be bad, but it seems the problem is in the Darwin kernel routines. Did someone post a bug report to Apple?

  • Jmanis Level 1 Level 1 (10 points)

    I agree the Raid configuration is a red herring and the warnings of hard drive failure are also not the case. I've not filled a bug report, I checked but we cannot see bug reports filed by others. Guess the only option is to file one and see if its closed out as duplicate or not. At this point unless someone can confirm the problem still exists I don't see a point and won't file one myself as the issue seems to have gone away and I've not done anything on my side.

  • Iosepho Level 1 Level 1 (0 points)

    (Haha of course I meant 3 TB hard drive...)

  • Iosepho Level 1 Level 1 (0 points)

    I think I'll file one though, because the problem has NOT gone away, it's just pretty transient and hard to catch. I had my corruption happen a few weeks ago, on Mountain Lion!

    I'm wondering though, what is currently the preferred way to file bugs with Apple? Their website is a bloody maze.

Previous 1 2 3 4 5 Next