symbolic links get corrupted by system process?

Greetings Folks,


This was posted in another forum, so I'm reposting two messages here:


I am having a problem with symbolic links getting corrupted. I have a new Mac Pro running 10.7.3. I have defined symbolic links


/Users/walker/G2S -> /Volumes/L2A/G2S [this is pointing to a different partition on the same JBOD RAID]

/home -> /Users


The second link was created after unmounting /home and removing it from the /etc/auto_master file.


Both symbolic links worked for several days. But then for some reason, without a reboot, the links became corrupted:


> pwd

/Users/walker

> ls -al G2S

lrwxr-xr-x 1 walker staff 16 Mar 24 03:08 G2S -> X??G???Gҡ?G???G

> cd G2S

G2S: No such file or directory.


Same nonsensical definition for /home link. I repeat, this did not happen after a reboot. It first happened on /home. I thought that might have been related to a new OS handling of the "/home" label. So I deleted the /home link and did a clean reboot. The G2S link was created after that reboot, not before.


After the above two problems happened, I created a new symbolic link


/Users/walker/G2S2 -> /Volumes/L2A/G2S


I then did not use this new symbolic link in any of my processing scripts. A few weeks went by, then this link somehow got corrupted too:


lrwxr-xr-x 1 walker staff 16 Apr 2 17:22 G2S2 -> 꺄G???Gĺ?Gú?G


Does anyone here know how symbolic links are managed on a Mac (any process that controls their linking?), or have any information to help me figure out how to fix this? For example, could it be due to bad RAM? I have 32 GB.


Thank you,

Kris Walker

Mac Pro, Mac OS X (10.7.3)

Posted on Apr 20, 2012 3:44 PM

Reply
233 replies

Nov 1, 2012 6:52 PM in response to twtwtw

twtwtw wrote:


I seem to recall a similar problem cropping up around the time when they shifted from the HFS system to HFS+ (or maybe I'm having a memory thing, who knows? 😉).


I'm pretty sure there weren't any symbolic links in 1998 🙂


maybe this is just what happens to HFS when it gets close to some critical size limit...


You mean 2 billion files per folder or 8 exabytes in size?

http://support.apple.com/kb/HT2422

Nov 1, 2012 7:04 PM in response to etresoft

etresoft wrote:


I'm pretty sure there weren't any symbolic links in 1998 🙂


Uh, duh. that's pretty durned true. 😊 It might be what you suggested; all I have is a vague memory of minor but inexplicable disk issues around that time. However, it seems likely I put 1 and 1 together and ended up with 4.7, so best to ignore this whole train of thought.

Nov 2, 2012 6:22 AM in response to twtwtw

My theory is still flaky RAID controllers and failing hard drives. There is a wide gamut of reported problems in this thread. The one piece of evidence that is common is corrupted links, usually system-created links that never change. In a HFS+ journaled file system, there is no way for the operating system to even attempt to overwrite those areas of the disk. Any changes are written to new areas on the disk, getting further and further away from those symbolic links created during system install. When the write is successful, it marks the old block as free. That one reason why HFS+ needs "room to work". If the disk gets full, you can't even change existing files because it doesn't directly overwrite. Someone with a 3+ TB hard drive likely has plenty of room to write changes so they would never, ever go back and write into blocks near installed symbolic links. Those fancy new hard drives, however, have very high densities and could be getting corrupted by data written "near" those links.


In short, I remain unconvinced that there is any Lion+ change that spontaneously corrupts symbolic links.

Nov 2, 2012 6:39 AM in response to etresoft

PS: those who claim I am spouting nonsense should be filing bug reports with Apple instead of posting anything here. Apple doesn't read these forums. Apple has many such big hard drives and can try to reproduce this problem.


The only entirely useless action is to sit on your hands and "wait for a fix". Anyone who suggests that immediately looks suspicious. If there is some critical system bug that is slowly corrupting data, why on earth would anyone even consider just sitting there and watching it happen, for years, hoping someone at Apple notices this thread and fixes it? Anyone who has enough data to fill a 3 TB hard drive should value that data. If you don't, something looks fishy.

Nov 2, 2012 7:01 AM in response to etresoft

RAID cards have already been ruled out by people experiencing the issue on an iMac internal hard drive.


Flaky hard drives are still suspect but I think unlikely because I would think corruption caused by a drive failure wouldn't be so selective; it would be corrupting more than just symlinks. So far there's no reported evidence of any other file corruption.


As for the "P.S."... lighten up, Francis. No one on this thread is a system engineer. So the best we can do is submit the bug report and figure out a replicable workaround while we wait for the bug fix. The thread itself is valuable in confirming that this really is a bug other people are experiencing in different environments.

Nov 2, 2012 8:07 AM in response to Brian Best


Brian Best wrote:


As for the "P.S."... lighten up, Francis.



Who is Francis?


No one on this thread is a system engineer.

How do you know that? My official title is "Principle Software Engineer", not "System Engineer", but I don't know about anyone else. Don't read too much into titles though. I have known "System Engineers" and "Engineering Fellows" who where nowhere near as clever as some "Software Engineer I"'s.


So the best we can do is submit the bug report and figure out a replicable workaround while we wait for the bug fix. The thread itself is valuable in confirming that this really is a bug other people are experiencing in different environments.



There are people experiencing a problem, but there is no confirmation of a "bug". Even if I am entirely wrong and this is a bug in Lion or Mountain Lion, I'm quite confident there will be no fix until 2013 at the earliest.


Keeping your data on a system that is experiencing random corruption, whatever the cause, is simply foolhardy. Even if you are the world's biggest Apple fanboi, if you have evidence that says your data is being corrupted - get if off NOW!

Nov 2, 2012 1:40 PM in response to etresoft

I just did what I said I'd do, I was runnning badblocks -w on the Mac for days on end. Guess what.


NOTHING.


Now I'm not saying your theory about flaky hardware is necessarily wrong, it just seems implausible to have corruption on one hand, and absolutely no signs of hardware failure on the other.

And neither have I heard of huge masses of people having data corruption on these drives. People tend to love them and write rave reviews about them. If it did crazy stuff like overwrite neighboring blocks randomly, WD would be eating so much crow right now, they'd be suffocating on black feathers. Yet, nothing! No recall, no FAQ item, nothing!


And yes, you are free to "just assume" the hardware is flaky, but this is not an intellectual debate. These things cost MONEY. You cannot take something back for warranty on basis of "just assuming", and even if you sweat out the money for another drive and replace the "assumedly" flaky part, it seems from some posts that sometimes the issue decides to stay around.

If there is a specific heck for infrastructure engineers, I'm pretty sure the Mountain Lion symlink corruption issue is featured in a prominent role among the torments there. This is a fregging Twilight Zone bug.


Even if it's a hardware issue, it would be great to know what causes it and where it really "lives". Does OSX use the drive more aggressively somehow, and bring out hidden errors that badblocks -w or Windows doesn't? Or is there some sort of bug in the SATA implementation of either the controller in certain Macs, or in the drives? Etc.

The question isn't simply "what is causing it", but EXACTLY HOW AND WHY.


As for my solution, I'll downgrade back to 1TB and sell the 3TB drive. Hopefully it will solve the issue. Also, if nobody has actually filed a bug yet, I'm going to the apple dev portal and filing one.

Nov 2, 2012 2:34 PM in response to Iosepho

Iosepho wrote:


I just did what I said I'd do, I was runnning badblocks -w on the Mac for days on end. Guess what.


NOTHING.


Now I'm not saying your theory about flaky hardware is necessarily wrong, it just seems implausible to have corruption on one hand, and absolutely no signs of hardware failure on the other.


Except for the primary indicator of hardware failure, corrupt data, there is absolutely no sign of hardware failure. Just because it doesn't fail on schedule or on cue, doesn't mean it didn't fail.


There is a fundamental difference between software and hardware. If you have 10 million pieces of hardware, a certain percentage is always going to fail. If you have 10 million examples of software, they are all identical. If one example fails, they must all have the same failure. There may be a certain combination of hardware and software that winds up failing, but that failure will only apply to that specific combination.


Also, if nobody has actually filed a bug yet, I'm going to the apple dev portal and filing one.


I encourage anyone who suspects a problem like this to file a bug report. The goal isn't to swamp Apple with bugs. Apple is already swamped and 6 more from this thread isn't going to make any difference. The goal is to get someone who is willing and able to work with Apple to identify where the failure actually is. The more people who file bug reports, the more likely one of them is to provide meaningful data.

Nov 20, 2012 3:06 PM in response to ktwalker69

Hi all,


I experience the same problem and I have spent some time to investigate on it, collecting the following notes:


- the symbolic links will be broken if they change frequently. For example, I have found broken the link "program -> MacOS" into the LibreOffice.app directory. I use LibreOffice every day. Apparently this file was not changed by me: after the software installation I never touched it. But if you look at it more closely, you will find that it may change every day!! Why? It's simple, for example the access time of the file (of the i-node) can change at every usage. For links that are not used, the problem do not exists: I have tested that creating some links and leave they unwatched (untouched) for many days;


- the problem seems connected to what in Linux is called "Virtual Filesystem", the representation in-memory of the real filesystem. In fact, if I use the sleep function to stop my computer the problem is more present that if I never stop the system. I suppose that the links get broken when the system execute a flush operation of the VFS. I have also tried to executed the flush manually (with the purge command) but it was not observed;


- the problem regards only the symbolic links, no file seems to be broken. Please, note that symbolic links are not text files, as someone told us into a precedent post, but inodes into which are written the path to the target. This information is archived into the inode's data structure.


This observations seems to indicate that the problem arise from the cache management of the HFS+ filesystem and not from a hardware problem. My configuration is a Mac Pro with a RAID card, four hard drives and Mac OS X Mountain Lion. The same problem was observed also with Mac OS X Lion on the same hardware.


Just to help the debug, I use the following commands:


- Find all broken links into the current directory:


/usr/bin/find -x . -type l ! -execdir test -e '{}' \; -ls | nl


- See low level informations:


stat -f "%N: %n%tAccess = %Sa %n%tModified = %Sm %n%tChanged = %Sc %n%tBirth = %SB %n%tInode = %i" <my_link>


Finally, I suppose the the symlinks (the inodes) are rewritten by the HFS+ driver during operation but, due to a wrong pointer or something similar, it sometimes destroys the informations. In fact, I have found that some broken links were pointing to a part of the contents of recent received emails or documents (!!).


Moitx

Nov 21, 2012 11:54 AM in response to moitx

Hi all,


today I found other corrupted links. Look at the following data taken before the recovery:


# Broken (path is /usr/bin)

/usr/bin/find -x . -type l ! -execdir test -e '{}' \; -ls | nl

1 670900 16 lrwxr-xr-x 1 root wheel 5 Nov 1 19:16 ./cc -> typev

2 671137 16 lrwxr-xr-x 1 root wheel 12 Nov 1 19:16 ./gcc -> ??@?d1


stat -f "%N: %n%tAccess = %Sa %n%tModified = %Sm %n%tChanged = %Sc %n%tBirth = %SB %n%tInode = %i" gcc

gcc:

Access = Nov 1 19:16:51 2012

Modified = Nov 1 19:16:51 2012

Changed = Nov 6 22:20:37 2012 <---

Birth = Nov 1 19:16:51 2012

Inode = 671137


stat -f "%N: %n%tAccess = %Sa %n%tModified = %Sm %n%tChanged = %Sc %n%tBirth = %SB %n%tInode = %i" cc

cc:

Access = Nov 1 19:16:50 2012

Modified = Nov 1 19:16:50 2012

Changed = Nov 6 22:18:05 2012 <---

Birth = Nov 1 19:16:50 2012

Inode = 670900


As I said in the last post, the content of the links is coming from other files (see typev) or completely unknown. The inode was modified by something (see Changed timestamp) after the installation (see Birth timestamp). cc and gcc are links coming from Xcode and relative to commands that I execute almost every day but obviously I never changed. It is interesting that these links were corrupted at different moment and that the access time do not change with the usage. If we want to resolve this problem we need to find out what changes it.


Moitx

Nov 21, 2012 4:12 PM in response to moitx

So I have opened a bug report with apple on the issue. In he mean time I'm looking for proof that the corruption is occurring only on symbolic links or on other files as well. Symbolic links are certainly most noticeable as a minor curruption results in significant distinction. Corruption of a docx or binary might go unnoticed. For apps however the new signature verification related to sandboxing should result in specific signature errors if binaries are corrupted. Lacking these errors my gut tells me they are not experiencing corruption but I'd prefer to have better evidence. If wholesale corruption is occurring due to hardware failure it should be evident if we can monitor the modification of files.


I'm looking into how to best create a checksum of the files in the filesystem to detect changes to any file. Something akin to what tripwire might do. Doing it via cmd line on the entire drive is very slow so in looking at starting with one or two folders.


Jeff

Nov 22, 2012 4:26 AM in response to Jmanis

You can use AIDE, available from MacPorts. It runs as a cron job and check the current state of all the files in the filesystem against a database that contains the original signatures, sending you a report of what is changed. See at http://aide.sourceforge.net. Be aware that if your filesystem is very huge, aide will take a lot of time to complete: in this case, use it to check only what is really important by means of a reduced configuration.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

symbolic links get corrupted by system process?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.