symbolic links get corrupted by system process?

Greetings Folks,


This was posted in another forum, so I'm reposting two messages here:


I am having a problem with symbolic links getting corrupted. I have a new Mac Pro running 10.7.3. I have defined symbolic links


/Users/walker/G2S -> /Volumes/L2A/G2S [this is pointing to a different partition on the same JBOD RAID]

/home -> /Users


The second link was created after unmounting /home and removing it from the /etc/auto_master file.


Both symbolic links worked for several days. But then for some reason, without a reboot, the links became corrupted:


> pwd

/Users/walker

> ls -al G2S

lrwxr-xr-x 1 walker staff 16 Mar 24 03:08 G2S -> X??G???Gҡ?G???G

> cd G2S

G2S: No such file or directory.


Same nonsensical definition for /home link. I repeat, this did not happen after a reboot. It first happened on /home. I thought that might have been related to a new OS handling of the "/home" label. So I deleted the /home link and did a clean reboot. The G2S link was created after that reboot, not before.


After the above two problems happened, I created a new symbolic link


/Users/walker/G2S2 -> /Volumes/L2A/G2S


I then did not use this new symbolic link in any of my processing scripts. A few weeks went by, then this link somehow got corrupted too:


lrwxr-xr-x 1 walker staff 16 Apr 2 17:22 G2S2 -> 꺄G???Gĺ?Gú?G


Does anyone here know how symbolic links are managed on a Mac (any process that controls their linking?), or have any information to help me figure out how to fix this? For example, could it be due to bad RAM? I have 32 GB.


Thank you,

Kris Walker

Mac Pro, Mac OS X (10.7.3)

Posted on Apr 20, 2012 3:44 PM

Reply
233 replies

Apr 15, 2013 4:03 PM in response to etresoft

I've rebuilt 2 different MacPro systems from scratch, and in both cases replaced a complete set of 4 drives. Yes, that is 16 drives in total. So yes my first response was to drop what I was doing and exchange in new media, install a fresh os, and reinstall all the apps from scratch. The only file transferring was for files in my home directory.


However, the drives have never tested bad, and the problem keeps coming back, and the only files I can confirm corrupted are soft link files.


Soft link files can not easily be overwritten. Short of writing a buggy kext, or an application that opens drives in raw mode, I'm not sure how it could be done.


I could be wrong, but I'm assuming that none of the applications from the app store can do either of those things. That leaves drivers, os, and around 10 command line apps that I grab through Homebrew.


I use Homebrew specifically because it doesn't need sudo, and last I checked you can't open a raw device unless your are root. There are only a few apps that I run with privileges: wireshark, lsof, vmware, and virtualbox. The latter two install kext, so they could be causing the problem.


I think it is likely that something about how I need to configure my machines is triggering the problem, and that I've got to narrow it down so that I can get someone to fix their bug. I'm thinking it is Apple, but until I can make it reproducible, I can't know for sure.


My prior question still stands; if osx has a microkernel, can subsystems stomp on each other?

Apr 15, 2013 4:07 PM in response to hstimer

hstimer wrote:


I use Homebrew specifically because it doesn't need sudo, and last I checked you can't open a raw device unless your are root. There are only a few apps that I run with privileges: wireshark, lsof, vmware, and virtualbox. The latter two install kext, so they could be causing the problem.



Well on my MacPro that exhibited this problem, I too used Homebrew (avoiding the need for sudo) and did NOT run any of the apps you mentioned (wireshark, lsof, vmware and virtualbox). Also have tried replacing drives & fully rebuilding multiple times, to no avail.

Apr 15, 2013 4:14 PM in response to etresoft

If you have any suggestions to help try track down the issue...... Data does not appear to get corrupted here so issue is more an annoyance than a data loss risk (at least at the moment)


fileXray on my machine shows that the AttributeModDate is updated for affected files but no other date field (thus file does not appear to show as changed for ls -l). Many of the symlinks appear to be in consecutive node records in the Catalog File Thread Record even though they may not be in same directory folder structure. Corruption appears to have occurred around a 5 minute period

Apr 15, 2013 4:51 PM in response to hstimer

Apple has shipped tens of millions of machines with Lion and Mountain Lion. You can post messages here in this discussion forum until you wear your fingers to the bone and it won't do you any good. You need proof. Documentation. Anything, really.


Things are are not going to help your case:

Homebrew

Wireshark

lsof

VMWare

VirtualBox

Both VMWare and VirtualBox on the same system

Anything in /usr/local

Anything at /System/Hidden/Versions/Current


And anything, really, that isn't made by Apple. The thing is, you can't just judge something by whether you think you are avoiding the need for sudo. If you ever give any installer your admin password, you are giving it permission to run anything via sudo anytime it wants to. No password required.


It is unlikely that any of that 3rd party software is causing any problem. Lots of people run all of that with no issues. That doesn't apply to you or anyone claiming this issue. This is an extraordinary claim that demands proof. If anyone has such proof, or even 5 minutes to spare, they should be sending a bug report to Apple instead of bickering with me.

Apr 15, 2013 5:00 PM in response to etresoft

etresoft wrote:


Things are are not going to help your case:

Homebrew

Wireshark

lsof

VMWare

VirtualBox

Both VMWare and VirtualBox on the same system

Anything in /usr/local

Anything at /System/Hidden/Versions/Current

As mentioned before, Homebrew is specifically engineered to run with standard user permissions (I.e. NOT requiring sudo). Since link corruption has been reported in areas such as /System/Library/Frameworks, areas where mere mortals normally don't have license to muck about in, this sounds highly unlikely that Hiomebrew would be the cause.


And lsof is part of the standard UNIX tools that Apple distributes as part of the base OS X install (it is located in /usr/sbin), so if that were the culprit then fixing it is part of their purview.


And there are users who have reported this issue that don't run any of the software mentioned (including myself - I do not use Wireshark, Vmware, etc.) Not to mention people who have reported this problem as occurring even on "vanilla" rebuilds.

Apr 15, 2013 5:13 PM in response to hstimer

We're just regular users like yourself (except that we happen to know more than most people about Macs). Finding a workaround or a reproducible scenario is a fine goal, but there are 13 pages in this thread with only a few on target. Most of it is griping about some error people think Apple has made, even though they can't say precisely what the error is.


You all can gripe here all you like, and all you're going to manage to do is make people grumpy. We're not Apple.


To be honest, I gave up trying to help on this thread months ago, when it became clear to me that people didn't want to solve the problem, they wanted to complain until someone else solved it. There's nothing I can do with that, since it's not a problem I can reproduce myself. If you have an earnest interest in solving this problem, I suggest you file a bug report, and if you want to discuss it some more here use Apple's bug report template to describe the problem. Then we might get somewhere.

Apr 15, 2013 6:00 PM in response to twtwtw

I've been trying to track this down for the last few months. I have put in a bug report which Apple kindly responded with "Please go see your nearest Genius bar to check on hardware" and eventually closed....(sigh). Evidence so far appears to be:


- Only appears to happen to people with Boot drives greater than approximately 2Tb. Reducing appears to either make problem go away or reduce chances of occurring. Analysis: guess that majority of users do not upgrade their hardware and thus fall below 2Tb boot partition and thus never see issue

- Symlink corruption appears to only occur to execution code - personal experience suggests things like /Applications though I have seen links in /usr/local/cuda/lib (used to run CUDA applications under BOINC) fail. For me the majority of failures appears to be "Resources" or "Current" symlinks under apps like MS Office. Not seen data failures as yet or corruption to data symlinks (use Aperture which is heavily dependent on symlinks so far with no issue).

- fileXray appears to not log changes to file system even though in monitor mode and symlink failures look like something is writing to file attributes not content (only AttributeModDate is updated when corruption occurs)

- (brief) analysis of latest failure suggests that failures are closely related in Catalog File Thread Table (whatever that is... ;-( ) in HFS+ suggesting some process overwriting same critical data portions

- Corruption is, as yet, unreproducable but appears to occur in batches - i.e. machine works fine for period of time then suffers symlink corruption. Can't find logs to idetify what be running around same time.


There appear to be several people with this issue and all appear to be in the camp of "Upgraded machine to use larger capacity disks than is generally supplied by Apple"

Apr 15, 2013 6:06 PM in response to Ed Newman

Ed Newman wrote:


I've been trying to track this down for the last few months. I have put in a bug report which Apple kindly responded with "Please go see your nearest Genius bar to check on hardware" and eventually closed....(sigh). Evidence so far appears to be:


- Only appears to happen to people with Boot drives greater than approximately 2Tb. Reducing appears to either make problem go away or reduce chances of occurring. Analysis: guess that majority of users do not upgrade their hardware and thus fall below 2Tb boot partition and thus never see issue

- Symlink corruption appears to only occur to execution code - personal experience suggests things like /Applications though I have seen links in /usr/local/cuda/lib (used to run CUDA applications under BOINC) fail. For me the majority of failures appears to be "Resources" or "Current" symlinks under apps like MS Office. Not seen data failures as yet or corruption to data symlinks (use Aperture which is heavily dependent on symlinks so far with no issue).

- fileXray appears to not log changes to file system even though in monitor mode and symlink failures look like something is writing to file attributes not content (only AttributeModDate is updated when corruption occurs)

- (brief) analysis of latest failure suggests that failures are closely related in Catalog File Thread Table (whatever that is... ;-( ) in HFS+ suggesting some process overwriting same critical data portions

- Corruption is, as yet, unreproducable but appears to occur in batches - i.e. machine works fine for period of time then suffers symlink corruption. Can't find logs to idetify what be running around same time.


There appear to be several people with this issue and all appear to be in the camp of "Upgraded machine to use larger capacity disks than is generally supplied by Apple"

I'd like to add to this. In my experience the corruption occurs to apps that are used frequently, and also that (at least for me) corruption wasn't limited to executable code. For example, I frequently got corrupted links in the Mail.app sandbox container (~/Library/Containers/com.apple.mail), which contains data, not executable code, and Mail is probably my #1 used app. Also I often got corrupted links in any frameworks that I add to Xcode projects (frameworks often contain links in them). But only to the apps that I was working on at the time -- none of the symlinks in my other, inactive Xcode projects appeared to corrupt themselves.

Apr 15, 2013 6:16 PM in response to hstimer

As reqeusted:


112 0 0xffffff7f824ac000 0x3000 0x3000 com.bresink.driver.BRESINKx86Monitoring (9.0) <5 4 3>

144 0 0xffffff7f82519000 0x5000 0x5000 com.trusteer.driver.gakl_driver_2 (1) <29 7 5 4 3 1>

155 0 0xffffff7f82549000 0x2000 0x2000 com.nvidia.CUDA (1.1.0) <4 1>



Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on

/dev/disk5 11719722624 2710149632 9009060992 24% 169416350 563066312 23% /

devfs 418 418 0 100% 724 0 100% /dev

/dev/disk4 976101312 875896 975225416 1% 109485 121903177 0% /Volumes/Extra Disk

map -hosts 0 0 0 100% 0 0 100% /net

map auto_home 0 0 0 100% 0 0 100% /home

/dev/disk6s2 7813186512 7707351904 105834608 99% 963418986 13229326 99% /Volumes/BackupDrive


df-k

new-host-2:/ root# df -k

Filesystem 1024-blocks Used Available Capacity iused ifree %iused Mounted on

/dev/disk5 5859861312 1355076120 4504529192 24% 169416513 563066149 23% /

devfs 209 209 0 100% 724 0 100% /dev

/dev/disk4 488050656 437948 487612708 1% 109485 121903177 0% /Volumes/Extra Disk

map -hosts 0 0 0 100% 0 0 100% /net

map auto_home 0 0 0 100% 0 0 100% /home

/dev/disk6s2 3906593256 3853689892 52903364 99% 963422471 13225841 99% /Volumes/BackupDrive


And in response to dburr - issue appears to only occur to apps that in use.

Apr 15, 2013 6:38 PM in response to dburr

dburr wrote:


As mentioned before, Homebrew is specifically engineered to run with standard user permissions (I.e. NOT requiring sudo). Since link corruption has been reported in areas such as /System/Library/Frameworks, areas where mere mortals normally don't have license to muck about in, this sounds highly unlikely that Hiomebrew would be the cause.

There is nothing special about Homebrew. One of the first replies in this thread contained the following:


Launch Daemons:

[loaded] homebrew.mxcl.memcached.plist

[loaded] homebrew.mxcl.redis.plist


That is Homebrew running with root permissions, doing whatever it wants.


And lsof is part of the standard UNIX tools that Apple distributes as part of the base OS X install (it is located in /usr/sbin), so if that were the culprit then fixing it is part of their purview.


That is just one of the things that hstimer reported as running with root permissions. That is not something one normally does. As I said before, there is nothing known, running as root or not, that will cause this other than hardware failure. If you think it is a software failure, then the burden is on you to prove it.


And there are users who have reported this issue that don't run any of the software mentioned (including myself - I do not use Wireshark, Vmware, etc.) Not to mention people who have reported this problem as occurring even on "vanilla" rebuilds.


And that brings us right back to reply #5 in this thread I think. A dozen people, with radically different machines, are complaining. Some of whom have startup volumes of 3 TB, and some have startup volumes of 750 GB. Some say Lion causes it. Some say Mountain Lion is required. What is the common thread between them? They all seem to be doing unusual things and experiencing unusual results.


You claim people are reporting this on "vanilla" systems? They might be in there somewhere. What I see is everyone running some very unusual software. There is nothing wrong with running unusual software. I do that myself. But if you run unusual software and have unusual result, it is your responsibility to track it down.


Ergo, there is a problem that is impacting 0.0004875 % of all Mac users. I think it is a hardware problem. Apparently Apple thinks it is a hardware problem. You think it is something else? You have to PROVE IT!


Report the bug to Apple. Complaining that somebody else on the internet claims to have done that and Apple told them to go to the Genius bar is not the same thing as "reporting the bug to Apple". Apple does not read these forums for good reason. This is not a problem that anyone can fix via an internet forum. Posting here is equivalent to doing nothing. The only thing that I, or anyone else, gets out of posting in this thread is more wasted time.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

symbolic links get corrupted by system process?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.