Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

symbolic links get corrupted by system process?

Greetings Folks,


This was posted in another forum, so I'm reposting two messages here:


I am having a problem with symbolic links getting corrupted. I have a new Mac Pro running 10.7.3. I have defined symbolic links


/Users/walker/G2S -> /Volumes/L2A/G2S [this is pointing to a different partition on the same JBOD RAID]

/home -> /Users


The second link was created after unmounting /home and removing it from the /etc/auto_master file.


Both symbolic links worked for several days. But then for some reason, without a reboot, the links became corrupted:


> pwd

/Users/walker

> ls -al G2S

lrwxr-xr-x 1 walker staff 16 Mar 24 03:08 G2S -> X??G???Gҡ?G???G

> cd G2S

G2S: No such file or directory.


Same nonsensical definition for /home link. I repeat, this did not happen after a reboot. It first happened on /home. I thought that might have been related to a new OS handling of the "/home" label. So I deleted the /home link and did a clean reboot. The G2S link was created after that reboot, not before.


After the above two problems happened, I created a new symbolic link


/Users/walker/G2S2 -> /Volumes/L2A/G2S


I then did not use this new symbolic link in any of my processing scripts. A few weeks went by, then this link somehow got corrupted too:


lrwxr-xr-x 1 walker staff 16 Apr 2 17:22 G2S2 -> 꺄G???Gĺ?Gú?G


Does anyone here know how symbolic links are managed on a Mac (any process that controls their linking?), or have any information to help me figure out how to fix this? For example, could it be due to bad RAM? I have 32 GB.


Thank you,

Kris Walker

Mac Pro, Mac OS X (10.7.3)

Posted on Apr 20, 2012 3:44 PM

Reply
233 replies

Apr 21, 2013 5:56 AM in response to etresoft

etresoft wrote:


When the installer installs the operating system, it writes all the files and then creates any symbolic links. On a journaled file system like HFS+, those writes are going to occur on previously unused portions of the disk. Therefore, symbolic links are likely to reside physically close to other links on the surface of the disk. And, on big enough drives, they may be on previously unused blocks. Considering the wide range of setups reported in this thread, there is likely to be little in terms of SMART reporting.


All drives have a certain percentage of bad blocks. The bigger the dirve, the more bad blocks. External DIY RAIDs on big, cheap, 1st generation, high-density drives are particularly sensitive to this possibility. Those that have blocks fail on critical portions of the disk are alerted when something breaks. If those bad blocks contain only symbolic links, then a failure might not be noticed and would seem to affect multiple symbolic links, such that those created en masse by an installer.


If it were a hardware failure such as I have described above, then there would likely be a low failure rate that matches the handful of reports in this thread. A software problem would affect all systems having the same configuration. Quite frankly, the rest of the world hasn't noticed anything like that. Ergo, it is probably hardware problems being repeated dozens of times among tens of millions of drives.


Here we go again... Etresoft, this is starting to be ridiculous. Are you purposefully trolling? I mean this would be definitely at home on 4chan, we could make it a running gag.


I've gone through my fair share of HDDs in my time, on Linux, Windows, etc. I've seen what a HDD failure looks like. I've had a Windows and a Linux system die of hard drive damage. It was as if the world was falling apart, not "omg, sometimes some symlinks has garbage in them".

(The rest of the world may not have noticed, as they aren't using JAVA or XCode. In fact, I know a Mac repairman guy, who DOESN'T EVEN KNOW WHAT a symbolic link IS! 99.9% of those "tens of millions" of drives are in the hands of people who know jack **** about computers. They have an issue, they take the machine to the Genius bar, and there Apple does something. Maybe they replace the drive, then the guy goes home, problems keep happening, takes it back, gets angry, Genius bar reports the bug in the internal tracking system, whatever, and we of course don't hear about it in this forum. Don't make the mistake believing that the only people who have the problem is who post in this forum.)


If it is hardware failure, then suggesting that it's media corruption is like saying that the reason your wife cannot find the book she is looking for is because there was a fire in your home, and it got burned. (While, of course, there are absolutely no indication that there may have been a fire, and the only thing missing is the book.) It's ludicruous.

I have no idea why you keep coming back to this idiotic explanation, you seem like a pretty seasoned expert, just stop and use your brain for a moment.

I mean even if we somehow assume that the media is wrong... Okay, first go, the OS writes the links on bad blocks, and the drive doesn't notice. Now please explain the following:

- The symlinks do not START OFF bad. They "go bad" after a few days of use.

- If you recreate the links, they will be good for a few days, then they go bad again.

- The same drive withstands A WHOLE WEEK of destructive write/read tests without a SINGLE error, and then keeps working in a Windows machine without as much as a hiccup.


It can be hardware failure - I'm thinking of some kind of lying cache manager, bad flush strategy, adapter issues, etc. Media corruption doesn't look like this. One thing is for certain. It's not the HDD's fault. It's either Apple's hardware, or some kind of weird unpublished incompatibility between their hardware and certain HDDs.

Apr 21, 2013 8:18 AM in response to Iosepho

Iosepho wrote:


Here we go again... Etresoft, this is starting to be ridiculous. Are you purposefully trolling? I mean this would be definitely at home on 4chan, we could make it a running gag.

It is my sincere desire that this thread die the death it deserves. I did not resurrect it. Someone else did that in order to put words into my mouth. All I did was respond to that and try to explain how a journaled file system works.


Now please explain the following:

- The symlinks do not START OFF bad. They "go bad" after a few days of use.

- If you recreate the links, they will be good for a few days, then they go bad again.

- The same drive withstands A WHOLE WEEK of destructive write/read tests without a SINGLE error, and then keeps working in a Windows machine without as much as a hiccup.


Sure sounds like hard drive failure to me.


It can be hardware failure - I'm thinking of some kind of lying cache manager, bad flush strategy, adapter issues, etc. Media corruption doesn't look like this. One thing is for certain. It's not the HDD's fault. It's either Apple's hardware, or some kind of weird unpublished incompatibility between their hardware and certain HDDs.

There are a number of big red flags in this thread that mark it as highly suspicious. Of course, anything is possible. That isn't the issue. The issue here is that people are saying, nay insisting, that the cause is some Apple kernel bug and that hardware failure is impossible. No one files bug reports. No one takes their machine in for professional diagnostics. Since it has been proven that the hardware is without flaw, people just recommend reinstalling. Just sit on your hands. This is Apple's bug to fix. Just wait for 10.8.4. Or maybe 10.9. Or just use Linux. Or maybe Windows. Yeah, that's it.


That is all nonsense. Any reasonable person who is really experiencing something that looks this much hard drive failure needs to replace the hardware. Any reasonable person who thinks they have found some software bug would report it to Apple instead of waiting for some magical software update to fix it. Any reasonable person who claims to have found a bug in an OS kernel used by something like 40 million people would be looking for proof. Anyone who cannot demonstrate said proof probably needs to give up building their own RAID arrays and buy a professionally made one.


I have no interest in continuing with this thread. You can keep running your scripts and debugging kernel source code all you want. I will only respond to personal attacks or people twisting my words. Ergo, it will be easy to see what people are really interested in. If they are interested in fixing their machines, they can help each other out. I have offered all the assistance I can for that endeavour. If they are only interested in ranting and personal attacks, I will be happy to continue with my contributions. Your choice.

Apr 21, 2013 10:12 AM in response to etresoft

Strangely enough, I haven't seen anyone insisting that it is a kernel bug although I have seen one person insisting that it is a hardware failure. Strangely enough, I have not seen anyone insisting that it cannot be a hardware failure although I have seen one person insisting that it cannot possibly be a software failure. Strangely enough, I made an Applecare call on this issue as have a fair number of other people and yet there is one person who insists that nobody files bug reports.


I will admit that I haven't brought my system in for professional hardware diagnostics but the problem went away when I repartitioned the hard drive and I was sufficiently relieved that the problem went away that I just left things at that.


While I don't know if we have a hardware problem or a software problem, it is clear that this discussion has been invaded by someone who does not seem to be encountering the issue and likes to stir things up with propositions for which he provides no supporting evidence other than a theory involving a really quite unusual hardware failure scenario not to mention one which leaves no traces in the system logs (I looked _hard_ for any trace at all of hardware failure reports in my system logs when I was encountering this problem and came up empty).

Apr 21, 2013 11:33 AM in response to daboulet

daboulet wrote:


Strangely enough, I haven't seen anyone insisting that it is a kernel bug although I have seen one person insisting that it is a hardware failure. Strangely enough, I have not seen anyone insisting that it cannot be a hardware failure although I have seen one person insisting that it cannot possibly be a software failure. Strangely enough, I made an Applecare call on this issue as have a fair number of other people and yet there is one person who insists that nobody files bug reports.

They why are you commenting at all seeing as how you obviously haven't even read any of the posts in this thread?


I will admit that I haven't brought my system in for professional hardware diagnostics but the problem went away when I repartitioned the hard drive and I was sufficiently relieved that the problem went away that I just left things at that.

That's the way to diagnose potential hardware failures that could lead to data loss. Change the configuration so you don't see it.


it is clear that this discussion has been invaded by someone who does not seem to be encountering the issue and likes to stir things up with propositions for which he provides no supporting evidence other than a theory involving a really quite unusual hardware failure scenario not to mention one which leaves no traces in the system logs (I looked _hard_ for any trace at all of hardware failure reports in my system logs when I was encountering this problem and came up empty).

So, being the first, and only, person trying to help the original poster for the first four months of this thread's existence is "invading"? I can't imagine what you would call hijacking a long-dead thread with irrelevant, "me too" replies that have little resemblance to the original poster's question.


Unfortunately, hardware failures usually aren't kind enough to announce their presence with a friendly entry in your system log.


As I look back on my first few posts over a year ago now. The original question involved symbolic links, manually created in an automount directory of a boot drive, pointing into a large JBOD RAID. The system in question is a client system hacked-up to run as a server. I was more interested in automount. The problem seems specific to one heavily modified DIY machine. I'm satisfied that all of these problems have a similar cause. This thread has no further technical interest for me. I shall only continue in order to meet the wishes of people looking for a fight.

Apr 21, 2013 1:04 PM in response to etresoft

I've been on this thread since late December of 2012 or maybe January of 2013. I'm not aware of the entire history. Apologies for suggesting that you were 'invading'. If your characterization of how this thread started is correct (and I have no reason to believe that it isn't) then it also appears that the topic of this thread has drifted a ways away from where it started. That is also unfortunate.


I simply cannot be bothered to respond to any of the other points you've made. Doing so would not advance this discussion at all and would probably result in yet another response which only dealt with about half of what I had to say.


I'm done with this thread. Bye.

Apr 21, 2013 1:15 PM in response to twtwtw

I agree. Please take the personal attacks elsewhere and either provide suggestions on how to move discussion forward or stay quiet.


Current analysis / thread suggests the following:


- Users using upgraded disks with boot partitions greater than 2Tb appear to see issue. Repartitioning to smaller sizes appears to remove (reduce??) problem

- Symlink target value alone get corrupted (no-one appears to state that they are experiencing data content issues) - I have seen mainly binary or executable links (often, for me, Resources / Version / Current links under Frameworks) but believe I've seen others with data links

- fileXray does not appear to show the write being made to the file system but affected files appear to show (for me at least...) an update attributeModDate but not an updated ContentModDate - since symlinks are attribute rather than content this appears logical.

- Symlink corruption survives reboot and thus data is being written to disk - does not matter whether RAID or not.

- Symlink corruption for me has hit same symlinks again (anecdotal evidence suggests something to do with active / hot files) so issue appears to be less likely (at risk of starting flame war) for disk hardware than something else. New Symlinks are seen to be created in new disk blocks so localised disk failures appear to be ruled out

- I and others have put in software bug reports to Apple with only response being to take hardware to Genius Bar. Hardware self-diagnostic tests do not appear to show any issues and I have run Speedtools to test read/write consistency and performance.

- Content of corruption appears to be data from other applications rather than totally random data (I have seen HTML / XML in overwritten data). Script provided by one of the contributors to the thread detects for binary (non-ASCII data) but I have seen corruption with ascii characters


Personally my next avenue of investigation is use of Instruments/DTrace within XCode to see whether I can catch the disk write. Challenge has been to find a way to reproduce at will so these tools can capture what system is doing at time.


As an aside, symlinks in general appear to be very poorly managing on OSX (no actual guarantee of correctness is made by OS) and I have seen many false links than appears to be bad configuration of application than any corruption. Prime example appears to be Sparkle Framework where a french resource links points to a directory under "/Users/andym/....." (fr_CA.lproj -> /Users/andym/Development/Build Products/Release/Sparkle.framework/Resources/fr.lproj). Howver many other links for me appear to point to non-existent files (Gimp as another example point to /tmp/help/Gimp which gets removed on reboot after install).



If anyone has found a way to reproduce this consistently please let us know.

Apr 22, 2013 11:46 AM in response to Ed Newman

I hesitate to contribute as I am not erudite in programming. But perhaps my problem can help isolate the problem. I found this thread searching with results on my permissions error log I got after having to reinstall Mountain Lion 10.8.3 on a MacBook Pro Retina with all the bells and whistles purchased at Apple Store in December 2012. Since it has a 750gb sushi, don't use any RAID, but do store data on external portable hard drive. Haven't tried to fix it; doesn't seem to be affecting the machine so far....but would like to know what to do....or not do!! Thanks.


2013-04-22 09:08:30 -0400: Verifying volume “Macintosh HD”

2013-04-22 09:08:30 -0400: Starting verification tool:

2013-04-22 09:08:30 -0400: Checking file system2013-04-22 09:08:56 -0400: Performing live verification.

2013-04-22 09:08:56 -0400: Checking Journaled HFS Plus volume.

2013-04-22 09:08:56 -0400: Checking extents overflow file.

2013-04-22 09:08:56 -0400: Checking catalog file.

2013-04-22 09:08:57 -0400: Checking multi-linked files.

2013-04-22 09:08:57 -0400: Checking catalog hierarchy.

2013-04-22 09:08:57 -0400: Checking extended attributes file.

2013-04-22 09:08:57 -0400: Checking volume bitmap.

2013-04-22 09:08:57 -0400: Checking volume information.

2013-04-22 09:08:57 -0400: The volume Macintosh HD appears to be OK.

2013-04-22 09:08:57 -0400: Repair tool completed:

2013-04-22 09:08:57 -0400:

2013-04-22 09:08:57 -0400:


2013-04-22 09:09:06 -0400: Verifying permissions for “Macintosh HD”

2013-04-22 09:09:44 -0400: Permissions differ on “System/Library/Frameworks/CoreGraphics.framework/CoreGraphics”; should be lrwxrwxrwx ; they are lrwxr-xr-x .

2013-04-22 09:09:45 -0400: Permissions differ on “System/Library/Frameworks/CoreGraphics.framework/Resources”; should be lrwxrwxrwx ; they are lrwxr-xr-x .

2013-04-22 09:09:45 -0400: Permissions differ on “System/Library/Frameworks/CoreGraphics.framework/Versions/Current”; should be lrwxrwxrwx ; they are lrwxr-xr-x .

2013-04-22 09:10:10 -0400:

2013-04-22 09:10:10 -0400: Permissions verification complete

2013-04-22 09:10:10 -0400:

2013-04-22 09:10:10 -0400:

Apr 22, 2013 1:00 PM in response to Traveler

Traveler:


Don't be concerned about the permissions errors in your report. You can use the Disk Utility "Repair Permissions" to fix those. It is quite common to find permission errors when you run the "Verify Permissions" feature of Disk Utility.


The problem we are discussing is quite different. In our case, symbolic links are becomming corrupted on disk drives that are greater than 2TB on certain systems. You are correct that your report shows symbolic links with permissions errors, but the problem in this thread is not a problem with file permission errors.


In our case, it is the target of the symbolic link that is incorrect, not the file permissions.


Hope this helps.

Apr 23, 2013 11:20 AM in response to Ed Newman

This is what I've discovered so far; I am only reporting what I've directly observed.


Machine configurations. All running 10.8.x
1. MacBook Air 11 -- no corruptions detected

* VMWare

2. MacBook Air 13 -- no corruption detected

* Only appstore apps

3. MacBook Pro 17 -- corruption detected

* Homebrew

* VMWare

* Virutalbox

4. MacPro 2012 - corruption detected

* Homebrew

* VMWare

* Virutalbox

* Apple software raid (10)


* On the MacPro no errors were found via, TechToolPro, and Apple Disk Utility

* On the MacPro, corruptions appear even when VMWare and Virtualbox are not running (but their kexts are still loaded)

* On the MacPro, complete disk replacement, and from scratch reinstall of all software, does not fix the problem

* On the MacPro, checksum verification of over 50% of the files did not find any other corruption. Checksums were not easilly available for all files, so I only have verification for about half the files.

* Inspection of the files with data blocks immediately before or after the corrupted data block show no corruptions

* All corrupted blocks appear in the range of 0xfc11f-0xfc395

* All corrupted files have high temperatures i.e. hot

* Almost all the top 'hot' files seem to be located in the 0xfc*** area

* Corruption can appear overnight when the machine is idle, but not asleep


Tools used:

fileXray

dtrace

https://github.com/daboulet/MacOSX-broken-symlink-finder

diskutility

techtool pro

codesign


Other discoveries:

* Block writes happen asynchronously in the Kernel independently of the User process -- this makes it more complicated to find who triggered the block write

* When a new soft link is created, it's data block is allocated outside of the "hot" zone

* HFS does safe writes i.e. doesn't write over the existing data block, but instead writes to a new block


I have dtrace watching for a particular block write on a heavily used softlink, but I'm not expecting to learn much from that probe. I'm sure someone with more knowledge and time would be able to create the right set of fbt probes and nail this problem down.


There is a debug kernel that you can download and install; I haven't tried using it, but I would imagine the extra symbol information would make fbt probes much more pleasant. Of course, the debug kernel could make the problem go away.

Apr 23, 2013 11:48 AM in response to hstimer

Pardon me if someone has suggested this before, but it would be very useful if you could run the MacPro in Safe Mode for a while (or at least the cleanest, simplest, most mac-only envronment you can construct). If link corruption occurs we've eliminated 3rd party software problems; if it doesn't we've isolated the issue to 3rd-party software.

May 1, 2013 2:55 PM in response to btcreeper

btcreeper wrote:


Everyone running Sophos antivirus?


Recent events:

I uninstalled Sophos on 4/13/2013.

Installed Symantec Internet Security Suite on 4/17/2013.

Installed latest version of Java on 4/17/2013.


Today I scanned for corruption and nothing appeared.


Computer usage and habits have not changed at all. I'll continue to monitor the system.

Unfortunately, in my case, I am not runnning Sophos Antivirus. Never ever touched it. In fact, I never ran any antivirus of any kind until after the problem started showing up. (Had to install one at that point just to make sure that my trouble wasn't being caused by some sort of virus or Trojan)

May 1, 2013 6:11 PM in response to btcreeper

btcreeper wrote:


Everyone running Sophos antivirus?


Nope. Not running it, never installed it.


In case you want to get a list of the symlinks on your system to refer to if (when?) corruption occurs, you can use a command like this:


sudo find / -xdev -type l -type l -ls >linklisting.txt


That will show you every symlink and what it points to. Then if you later find corrupt links, you'll at least have a reference that a quick grep will show you what to do to correct it.

symbolic links get corrupted by system process?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.