symbolic links get corrupted by system process?

Greetings Folks,


This was posted in another forum, so I'm reposting two messages here:


I am having a problem with symbolic links getting corrupted. I have a new Mac Pro running 10.7.3. I have defined symbolic links


/Users/walker/G2S -> /Volumes/L2A/G2S [this is pointing to a different partition on the same JBOD RAID]

/home -> /Users


The second link was created after unmounting /home and removing it from the /etc/auto_master file.


Both symbolic links worked for several days. But then for some reason, without a reboot, the links became corrupted:


> pwd

/Users/walker

> ls -al G2S

lrwxr-xr-x 1 walker staff 16 Mar 24 03:08 G2S -> X??G???Gҡ?G???G

> cd G2S

G2S: No such file or directory.


Same nonsensical definition for /home link. I repeat, this did not happen after a reboot. It first happened on /home. I thought that might have been related to a new OS handling of the "/home" label. So I deleted the /home link and did a clean reboot. The G2S link was created after that reboot, not before.


After the above two problems happened, I created a new symbolic link


/Users/walker/G2S2 -> /Volumes/L2A/G2S


I then did not use this new symbolic link in any of my processing scripts. A few weeks went by, then this link somehow got corrupted too:


lrwxr-xr-x 1 walker staff 16 Apr 2 17:22 G2S2 -> 꺄G???Gĺ?Gú?G


Does anyone here know how symbolic links are managed on a Mac (any process that controls their linking?), or have any information to help me figure out how to fix this? For example, could it be due to bad RAM? I have 32 GB.


Thank you,

Kris Walker

Mac Pro, Mac OS X (10.7.3)

Posted on Apr 20, 2012 3:44 PM

Reply
233 replies

Oct 18, 2012 4:29 PM in response to ktwalker69

ktwalker69 wrote:


Of course, no other text files get corrupted. Just links and 10.7+ OS, even after changing hard disks and RAM. For users who don't use symbolic links much, maybe this filing system/O.S. bug doesn't affect them as much.

How would you know? Perhaps the other text files getting corrupted are your Portuguese localization files.


Perhaps you should review where you are buying those hard drives and what other software you have installed.


Malarky. It is a bug in the O.S. or HFS+ new to 10.7+. Facts supercede faith.

That is an extraordinary claim and extraordinary claims need extraordinary evidence. You don't have any evidence. A handful of anonymous internet postings don't count.

Oct 18, 2012 4:33 PM in response to etresoft

"or the data gets overwritten"


My point exactly. 🙂 It obviously gets overwritten. By what, well that's what we want to know.


As for millions of users not experiencing this problem, it seems to be rather transient. I am thinking concurrency bug. Also, OSX and end user software simply does not use symlinks much, so I can imagine systems where the user simply doesn't notice the error.


"How would you know? Perhaps the other text files getting corrupted are your Portuguese localization files."


Really. Come now. As if random data corruption would target text files specifically. A random corruption of this magnitude would make the OS unbootable in no time.

I once had random corruption on a Windows box, and it didn't last 5 hours before becoming completely bricked. (Thankfully it was an office computer, haha.)

Oct 18, 2012 4:43 PM in response to Iosepho

Iosepho wrote:


OSX and end user software simply does not use symlinks much, so I can imagine systems where the user simply doesn't notice the error.

OS X makes extensive use of symbolic links. There are tens of thousands of them at least.


Really. Come now. As if random data corruption would target text files specifically. A random corruption of this magnitude would make the OS unbootable in no time.


There is certainly nothing that would specifically target text files or symbolic links. That was just an example to show how you could have random corruption go on for some time and never notice it. Most likely, it would be one particular area of the disk that is bad. That could manifest itself as small files created at about the same time experiencing the corruption and then no other files afterwards - because those get created on good areas of the disk. If the corruption is happening near the boot sector, then that could make the drive unbootable. If the corruption is happening on previously unused sections of a hard drive, then a new OS install that writes data to those unused areas of the disk for the first time might cause you to notice.


I read back through some of the posts in this thread. There is everything from unknown RAID devices to black and white disk failures. It just seems that anyone who has noticed disk corruption through bad symbolic links (whether they admit that or not) is posting in this thread.

Oct 19, 2012 12:01 AM in response to etresoft

There's not been a single report of hard drive failure on this thread. The previous post that may have given you the impression was about a SMARTreporter error for IO caused by the users mail app corrupting its links. From the SMARTreporter website "SMARTReporter does not only check for S.M.A.R.T. disk failure predictions but increases failure prediction accuracy by checking for dangerous I/O errors as well." So this app reports smart errors and watches the system logs for any app reporting an I/O error.


I post this only for clarity for those that find this thread and are not misled into thinking a root cause has been found.


Let's just all agree to move past the failed hard drive scenario. Anyone finding this forum should check their disk for SMART errors and if detected replace. Everyone else should keep reading.


J

Oct 19, 2012 1:01 AM in response to ktwalker69

Since our last post on Sep 29, 2012 i just wanted to let you all know that the corruption has stopped completely for us. We did want to wait before confirming this until we have ran a substantial amount of tests and passed a bit of time to be sure. In our opinion the problem we had was that our ( boot disk / the disk mountain lion installed) was part of our hardware RAID10. Since we attached the FireWire 800 disk to the server and used that to as boot disk / install and the RAID as a data partition we've lost all traces of corruption. Not saying this is the root cause at all but just advicing that if you have a simular setup, you could try this 🙂


If any of you have other working solutions, lets try to summerize them on what worked for you !

Maybe we can then make a collective effort to get this to the attention of Apple.


Summary :


- Mac Pro 5.1 - 32GB

- Mountain Lion 10.8.X

- 4 X 2TB disks configured in a RAID 0 + 1


Corruption stopped after adding an external 2TB FireWire800 disk used as boot / main disk and using the RAID10 as data disk.

Oct 19, 2012 2:09 AM in response to etresoft

Okay, since Ertesoft seems to be completely hung on the corrupt hard drive idea, all evidence aside, let's rule it out once and for all.

I remember that on MS-DOS, there was a software called Norton Disk Doctor, that scoured your hard drive, sector by sector, and checked for errors. Without the need to erase the data on the disk.

Is such an application available these days? I beleive it would need to boot on its own, to have the machine to itself.


(BTW, the operating system would be made unbootable by any corruption of its main binaries or resource files, not just the boot sector.)

Oct 19, 2012 4:39 AM in response to Iosepho

DiskWarrior sense to be the current favorite for drive checking and repair. It's on the expensive side, but it can find and correct issues that Disk Utility misses entirely.


Smeevi's post leads me to two separate thoughts:

  1. Driver/firmware issue. Not all disks are created equal with respect to raids. out-of-date drivers or firmware that may not cause problems in standard usage might not be up to the peculiar requirements of running a virtual disk. This would be particularly true if you've just upgraded the OS; it's been my experience that disks often get squirrely after a new system is installed.
  2. From what I understand, it's a bad idea to run an operating system from a raid. I assume that's because the OS does a lot of disk access, and isn't expecting its data to get written out across multiple physical disks: disk access routines optimized for speed might not play will with distributed data. Is anyone having this problem not running their OS from the raid?


I'll add as an aside that when I see someone with five points to their name telling off someone with almost twenty thousand I can't help but roll my eyes. I'm not saying that etresoft's assessment is the correct answer, of course, but I'm d@mned certain it's a good answer that's worth some careful thought. Whatever one might think about the silly points system here, dismissing someone who's gotten twenty thousand of them - five/ten points a shot - is a bad idea.

Oct 19, 2012 4:45 AM in response to Iosepho

Iosepho wrote:


Okay, since Ertesoft seems to be completely hung on the corrupt hard drive idea, all evidence aside, let's rule it out once and for all.


It is not that I'm hung on on the failed (not corrupt - failed) hard drive idea. It is that other people are hung up on the OS bug idea. A failed hard drive is the most likely cause for this scenario. A flaky RAID controller is also a possibility, but far fewer people have RAID controllers. Surely no one here would ever want to focus soley on the specific issues and conditions reported by the original poster 🙂


I remember that on MS-DOS, there was a software called Norton Disk Doctor, that scoured your hard drive, sector by sector, and checked for errors. Without the need to erase the data on the disk.

Is such an application available these days? I beleive it would need to boot on its own, to have the machine to itself.


Modern hard drives are far removed from the old MFM and RLL drives of yore.


(BTW, the operating system would be made unbootable by any corruption of its main binaries or resource files, not just the boot sector.)


That is true, but a boot sector failure is guaranteed death. Failure in the main binaries is likely death. Failure in one particular sector that just happens to have resource files or links is not a guaranteed death. Don't forget that only those people who have had the last type of failure are the only ones who can boot, run, and report strange things. Those whose hard drives have failed in other areas causing an inability to boot would likely not notice corrupt links.

Oct 19, 2012 5:37 AM in response to etresoft

"A failed hard drive is the most likely cause for this scenario. A flaky RAID controller is also a possibility, but far fewer people have RAID controllers."


Well the problem I noticed happened once, on a brand new hard drive, after lugging a huge amount of data over from a save on Firewire. A good number of the symlinks of the Java installation done the day before got corrupted.


This is weird for many reasons, notably that nothing but the Java symlinks seem to have been affected, and that not Java, nor its symlinks have been accessed around that time. My first thought was HDD failure as well, but the SMART tool does not report errors. I'd be the happiest if it did, because I have a year of warranty on the bloody thing, but no, it's apparently perfect.


So if it IS a hard drive failure, I'd like a way to prove it. Not just because taking an iMac apart is not something I can do at home, the service costs money and I don't own a car, so lugging it over is a chore, but because if I can prove that the drive is defective, I can get a replacement, otherwise I'm essentially ******. (sorry)


I haven't experienced any sort of breakage since then. Note though that I revved up the HDD cooler to max RPM by default with a handy util, as it was running over 50 degrees Celsius under heavy load. I mailed the service guy about this who installed the drive, and he said it's perfectly normal temperature for an iMac. I wonder if 55-56 Celsius could result in such corruption in the HDD?

Oct 19, 2012 8:00 AM in response to Iosepho


Iosepho wrote:


I beleive that statement to be extremely untrue. At least in the world of servers, running an OS from a raid is the norm.


There's RAIDs and then there's RAIDs. It is normal to run a server from a RAID controller. It would be unusual to run a server from a software RAID. People running servers from a hardware RAID controller who noticed something like corrupt symbolic links probably wouldn't post a question here on Apple Support Communities. They would call Apple/Oracle/Dell/IBM and say "get down here and fix it".


Well the problem I noticed happened once, on a brand new hard drive, after lugging a huge amount of data over from a save on Firewire. A good number of the symlinks of the Java installation done the day before got corrupted.


The age of a hard drive is not significant. They can fail at any time. Some people have suggested initializing a hard drive with zeros as a way to map out the bad blocks. I don't know if that is necessary or not. I would expect things like SMART to automatically handle such things. I don't know the details of what you were copying from where. I do know that SMART is not enabled on external drives. Perhaps that is the cause. Perhaps performing a zero initialze of external drives is a good idea. I will have to investigate. That could be important.


This is weird for many reasons, notably that nothing but the Java symlinks seem to have been affected, and that not Java, nor its symlinks have been accessed around that time. My first thought was HDD failure as well, but the SMART tool does not report errors. I'd be the happiest if it did, because I have a year of warranty on the bloody thing, but no, it's apparently perfect.


So if it IS a hard drive failure, I'd like a way to prove it. Not just because taking an iMac apart is not something I can do at home, the service costs money and I don't own a car, so lugging it over is a chore, but because if I can prove that the drive is defective, I can get a replacement, otherwise I'm essentially ******. (sorry)

iMac? Are you impacted by the iMac Repair program?


Regardless, SMART is not a definitive proof of either failure or health. If SMART says the drive is dead, then it is. There are other indicators that could indicate failure as well, with random data corruption being right at the top of the list.


I haven't experienced any sort of breakage since then. Note though that I revved up the HDD cooler to max RPM by default with a handy util, as it was running over 50 degrees Celsius under heavy load. I mailed the service guy about this who installed the drive, and he said it's perfectly normal temperature for an iMac. I wonder if 55-56 Celsius could result in such corruption in the HDD?


I'm skeptical of "handy utils". I prefer to trust Apple's default settings. I challenge Apple with bug reports just as I challenge people here on the forums. Apple engineers have a very good track record of disproving my claims so I have come to put a lot of faith in their judgement, even when it contradicts my own.

Oct 20, 2012 11:56 AM in response to etresoft

Nah, I guess I would have qualified, if I had the original retail papers of the Mac. I bought it second hand, and a good two years in the Seagate started to fail. But it's not the one I had the trouble with. It hanged for 5-10 seconds and made clicking noises, but it never lost data on me. The new HDD isn't Apple certified, it's a WD Green I had a third party service install (ie. they are a non-Apple service, specializing in Apple devices past their official warranty... normally they are awesome, they can clean and seal a cloudy iMac panel for like $100 instead of replacing it - what an Apple certified service would suggest). 🙂


Oh and needless to say the drive I had trouble with is not external, it's internal.


So anyway, any idea for finding out if the HDD is okay or not? Initializing it with zeros from a rescue disk? Anything else? 🙂

Oct 21, 2012 8:07 AM in response to Iosepho

Under those circumstances, I would just assume it is bad. Those "green" drives are particularly nasty. They cause enough trouble as externals. I wouldn't use one as an internal. When it comes down to it, I don't think you can really do the kind of enclosure and drive swapping that we all used to do. Modern drives are manufactured to be so cheap and so high performance that they have all kinds of different techniques to save money, energy, and time. There is no way to tell which ones will work in which configurations. One way to tell that it doesn't work would be some really strange behaviour like corrupted symbolic links.


I have been guity of this myself. I bought one of those "green" drives for use in Time Machine, swapping it into an old enclosure. It immediately started randomly disconnecting itself with a big red error dialog on the Mac. I returned that drive for a different one that doesn't cause that disconnect error as much. Where is the problem? Drive? Enclosure? Cable? I have no idea. I just know that now I need to fix it and I can't really tell for myself which drive is going to work. I am just waiting for the price of Thunderbolt externals to come down a bit and I'll migrate over to that and shift the known, good drives down the food chain to the old non-Thunderbolt machine.


So, I don't have a definitive answer for you. I can't say that yes the green drive is causing the problem, but I sure wouldn't be surprised.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

symbolic links get corrupted by system process?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.