ktwalker69

Q: symbolic links get corrupted by system process?

Greetings Folks,

 

This was posted in another forum, so I'm reposting two messages here:

 

I am having a problem with symbolic links getting corrupted.  I have a new Mac Pro running 10.7.3.  I have defined symbolic links

 

/Users/walker/G2S -> /Volumes/L2A/G2S [this is pointing to a different partition on the same JBOD RAID]

/home -> /Users

 

The second link was created after unmounting /home and removing it from the /etc/auto_master file.

 

Both symbolic links worked for several days.  But then for some reason, without a reboot, the links became corrupted:

 

> pwd

/Users/walker

> ls -al G2S

lrwxr-xr-x  1 walker  staff  16 Mar 24 03:08 G2S -> X??G???Gҡ?G???G

> cd G2S

G2S: No such file or directory.

 

Same nonsensical definition for /home link.  I repeat, this did not happen after a reboot.  It first happened on /home.  I thought that might have been related to a new OS handling of the "/home" label.  So I deleted the /home link and did a clean reboot.  The G2S link was created after that reboot, not before.

 

After the above two problems happened, I created a new symbolic link

 

/Users/walker/G2S2 -> /Volumes/L2A/G2S

 

I then did not use this new symbolic link in any of my processing scripts.  A few weeks went by, then this link somehow got corrupted too:

 

lrwxr-xr-x   1 walker  staff     16 Apr  2 17:22 G2S2 -> 꺄G???Gĺ?Gú?G

 

Does anyone here know how symbolic links are managed on a Mac (any process that controls their linking?), or have any information to help me figure out how to fix this?  For example, could it be due to bad RAM?  I have 32 GB.

 

Thank you,

Kris Walker

Mac Pro, Mac OS X (10.7.3)

Posted on Apr 20, 2012 3:47 PM

Close

Q: symbolic links get corrupted by system process?

  • All replies
  • Helpful answers

first Previous Page 3 of 16 last Next
  • by etresoft,

    etresoft etresoft Aug 14, 2012 5:49 PM in response to dburr
    Level 7 (29,380 points)
    Aug 14, 2012 5:49 PM in response to dburr

    If you are having a SMART failure, your drive is dead. Get a new one.

  • by dburr,

    dburr dburr Aug 14, 2012 6:03 PM in response to etresoft
    Level 1 (15 points)
    Aug 14, 2012 6:03 PM in response to etresoft

    etresoft wrote:

     

    If you are having a SMART failure, your drive is dead. Get a new one.

     

    No SMART failure, in fact no drive failure whatsoever.  As I stated in my original message I ran the drive through a battery of tests and did not find a single bad block.  It looks like SMARTreporter is mistakenly interpreting the sqlite error message as an indicator of drive failure.

  • by etresoft,

    etresoft etresoft Aug 14, 2012 6:37 PM in response to dburr
    Level 7 (29,380 points)
    Aug 14, 2012 6:37 PM in response to dburr

    Or perhaps both SQLite and SMARTreporter are noticing the same I/O failure.

  • by Nathaniel Madura,Helpful

    Nathaniel Madura Nathaniel Madura Aug 20, 2012 7:44 AM in response to ktwalker69
    Level 1 (10 points)
    Aug 20, 2012 7:44 AM in response to ktwalker69

    I am also seeing this problem on a "Relatively new" (< 1 yr old) Mac Pro system running 10.7.4 Server. I have searched into RAM problems (swapping RAM in/out), disk problems, reapplying combo updaters, yet, the problem persists.

     

    I have even hashed of all the libs and compared them to a different machine, and all the hashes match!

     

    It is just the sym links that are being corrupted. You can execute the following to see what sym links aren't working...

     

     

    $ sudo find -L . -type l -ls

     

     

    I get results like the following (trimmed for readability)

     

    lrwxr-xr-x /System/Library/Frameworks/SystemConfiguration.framework/Resources -> ?Ν?ze??f???d!?????????   
    lrwxr-xr-x /System/Library/Frameworks/SystemConfiguration.framework/SystemConfiguration -> X?Ϟ????Ȣġ|?:?ԡ??ũ.??ޅ܅??   
    lrwxr-xr-x /System/Library/Frameworks/SystemConfiguration.framework/Versions/Current -> ? 

     

    If I fix a sym link it seems to get corrupted (again) in several days.

     

    Message was edited by: Nathaniel Madura FWIW I am also on a Software RAID 0+1

  • by Smeevil,

    Smeevil Smeevil Aug 20, 2012 7:49 AM in response to Nathaniel Madura
    Level 1 (0 points)
    Aug 20, 2012 7:49 AM in response to Nathaniel Madura

    Hi,

     

    It seems that we (barttenbrinke and Me) isolated the problem.

    It has absolutely to do with using a form of RAID and the boot disk being a part of that same RAID.

     

    What we did to isolate is :

    Have the core / base system on one partition. Create a second partition (on the same RAID) and there we put all the mutable stuff like /Applications /var /Development (xcode) etc.

     

    This will not "solve" the problem but will prevent it from happening to critical stuff.

    The corruption has not happened at all on the second partition.

  • by ktwalker69,

    ktwalker69 ktwalker69 Aug 20, 2012 12:39 PM in response to Nathaniel Madura
    Level 1 (0 points)
    Aug 20, 2012 12:39 PM in response to Nathaniel Madura

    Nathaniel, that is a great tool for helping to collect data about this problem.  Thanks for that.

     

    Smeevil, my troublesome system has a single, bootable disk that is *not* part of a RAID.  [Also, Jmanis above had this problem without RAID or JBOD].  My system does have a 12 TB JBOD where I keep data files, comprising three 4 TB Hitachi disks.  My problems used to be with symlinks "on the" bootable disk pointing to stuff on the JBOD.  However, recently this system was wiped and started over (3rd time).  It is now being managed by our IT department.  The problem has reoccurred on the bootable disk pointing to stuff on the bootable disk.

     

    In an attempt to summarize everything above:

     

    Disks were replaced.  Problem came back.

    Different disks (in terms of manufacturer and size) see same problem.

    RAM was replacedl.  Problem came back.

    Disks/partitions were wiped.  Came back.

    Occurs on "Mid 2010 Mac Pro" and "Mid 2008 iMac" and "Relatively new (< 1 yr old)" Mac Pro.

    Disks were partitioned differently:

      - boot disk not part of RAID: problem exists

      - boot disk part of RAID: problem exists

      - non-bootable disks: problem does not exist

    Snow Leopard: problem does not exist

    Lion and Mountain Lion: problem exists

    Problem is not easily reproducible and may take days to weeks for a fixed symlink to get recorrupted.

    Problem occurs on some machines, but not others running the same OS and JBOD/partioning structure.

     

    Speak up if I got something wrong, and I'll edit it.

     

    Just ran

     

    sudo find -L / -type l -ls | grep '?'

    and

    sudo find -L / -type l -ls | grep '@'

     

    on 4 machines that also have 10.7.4 installed with a single drive having the boot partition and a larger JBOD RAID partition containg random data files.  I did *not* find this problem on any other machine unfortunately.  It is possible that there was an update on the other machines that I did not apply (to keep them from slowing down).

     

    Just for full disclosure, this is a "Mac Pro Mid 2010" Quad-Core intel Xeon using 4 Hitachi 4 TB disks and 32 GB RAM.

     

    Does anyone know (how about it ertsoft?) if the Inode structure is supposed to be updated routinely via some system plist?  If so, what processes are involved?

     

    I don't think this is going to get solved on this forum.  Has anyone submitted a proper bug report?  I think the only way this is going to get solved is for an Apple engineering team to get access to a computer in which the problem is occurring consistently, which I think requires a bug report submission.  My producting system is working at the moment, but others are not.  As mentioned, this is becoming an expensive problem for some people/groups.

     

    Kris

  • by dburr,

    dburr dburr Aug 20, 2012 12:45 PM in response to Smeevil
    Level 1 (15 points)
    Aug 20, 2012 12:45 PM in response to Smeevil

    Smeevil wrote:

     

    Hi,

     

    It seems that we (barttenbrinke and Me) isolated the problem.

    It has absolutely to do with using a form of RAID and the boot disk being a part of that same RAID.

     

    Unfortunately that pattern does not fit my situation.  I am indeed using a RAID (Apple's software RAID 0) but my boot disk is NOT part of the RAID (it's on its own drive).

  • by etresoft,

    etresoft etresoft Aug 20, 2012 1:37 PM in response to ktwalker69
    Level 7 (29,380 points)
    Aug 20, 2012 1:37 PM in response to ktwalker69

    As twtwtw said above, symbolic links are just text files with special attributes. The only way for this type of corruption to occur is the disk blocks containing the symbolic link data to be overwritten with other data. I don't see how that kind of corruption could happen without corrupting any other type of file. The problem would have to lie deep inside the underlying RAID structure if a relatively higher level disk repair doesn't see the corruption.

     

    The best you can do is verify the the symbolic links are good and as soon as you detect that they aren't, drop everything, save your log files, and submit a bug report.

  • by Jmanis,

    Jmanis Jmanis Sep 29, 2012 3:24 AM in response to ktwalker69
    Level 1 (11 points)
    Sep 29, 2012 3:24 AM in response to ktwalker69

    Bumping this post to say that this problem  appears to have gone away for me although I'm not doing the disk intense work I was doing before.  At the time I was experiencing the issue I was doing some compiling and forensic investigations that involved a lot of disk activity. Lately I've only been using the iMac for more routine things and haven't had the problem. I'm curious if Apple quietly fixed this in one of their patches.

     

    Has anyone else on the thread seen the issue go away or are you still suffering? 

     

    BTW- Here's another way to check for broken symlinks:

    for f in $(find / -type l); do if [ ! -e "$f" ]; then echo "$f"; fi; done 2>1 | grep -v Permission


  • by Smeevil,

    Smeevil Smeevil Sep 29, 2012 3:30 AM in response to ktwalker69
    Level 1 (0 points)
    Sep 29, 2012 3:30 AM in response to ktwalker69

    We solved the problem by attaching a 2GB FireWire 800 disk to the server which is now its boot disk.

    The original 4 1TB drives are in a Raid0+1 and we do not see any corruption at all anymore.

     

    We do use symlinks on the boot disk to point to directories in the raid and they do not corrupt either.

    I hope this info will help anyone having the same problems....

  • by dburr,

    dburr dburr Sep 29, 2012 1:18 PM in response to Jmanis
    Level 1 (15 points)
    Sep 29, 2012 1:18 PM in response to Jmanis

    Not going to say that the problem has gone away for sure, lest I jinx myself, but I haven't noticed this happening in the recent past.  Since 10.8.2, possibly 10.8.1.  My usage pattern hasn't changed (if anything it has been a bit more intensive than usual - just released two new apps and updates to a few other apps).  So maybe they did quietly release a fix to this in one of the recent OS X updates.  Let's hope so!  *crosses fingers and toes*

  • by Iosepho,

    Iosepho Iosepho Oct 4, 2012 9:47 AM in response to ktwalker69
    Level 1 (0 points)
    Oct 4, 2012 9:47 AM in response to ktwalker69

    It has NOTHING to do with Raid. Absolutely nothing. It might have to do with having multiple drives in the system though!

     

    I have experienced it (in JAVA) on a perfectly usual late 2009 iMac (though with a replaced, 3GB HDD) running Mountain Lion.

    I reinstalled Java (good luck doing THAT if you're anything less than a complete UNIX geek), now it seems to be working. I did quite a lot of data hauling before the problem came up, copying stuff from backup drives through USB and Firewire 800.

     

    The hard drive is brand new, so I got a little scared that it might be bad, but it seems the problem is in the Darwin kernel routines. Did someone post a bug report to Apple?

  • by Jmanis,

    Jmanis Jmanis Oct 4, 2012 1:58 PM in response to Iosepho
    Level 1 (11 points)
    Oct 4, 2012 1:58 PM in response to Iosepho

    I agree the Raid configuration is a red herring and the warnings of hard drive failure are also not the case. I've not filled a bug report, I checked but we cannot see bug reports filed by others. Guess the only option is to file one and see if its closed out as duplicate or not. At this point unless someone can confirm the problem still exists I don't see a point and won't file one myself as the issue seems to have gone away and I've not done anything on my side.

  • by Iosepho,

    Iosepho Iosepho Oct 4, 2012 2:01 PM in response to Iosepho
    Level 1 (0 points)
    Oct 4, 2012 2:01 PM in response to Iosepho

    (Haha of course I meant 3 TB hard drive...)

  • by Iosepho,

    Iosepho Iosepho Oct 7, 2012 1:00 PM in response to Jmanis
    Level 1 (0 points)
    Oct 7, 2012 1:00 PM in response to Jmanis

    I think I'll file one though, because the problem has NOT gone away, it's just pretty transient and hard to catch. I had my corruption happen a few weeks ago, on Mountain Lion!

    I'm wondering though, what is currently the preferred way to file bugs with Apple? Their website is a bloody maze.

first Previous Page 3 of 16 last Next