Previous 1 3 4 5 6 7 Next 233 Replies Latest reply: Mar 9, 2015 12:02 PM by Jyri Palm Go to original post
  • ktwalker69 Level 1 Level 1 (0 points)

    etresoft wrote:

     

    Regardless, the full path the link points to must reside somewhere on the disk. The only way that data gets changed is if the hard drive physically fails or the data gets overwritten. Considering there are several million users of Mountain Lion not experiencing this problem, then the cause is either a failing hard disk or incompatible system-level software that the rest of us don't have.

     

    Malarky.  It is a bug in the O.S. or HFS+ new to 10.7+.  Facts supercede faith.

  • etresoft Level 7 Level 7 (26,865 points)

    ktwalker69 wrote:

     

    Of course, no other text files get corrupted.  Just links and 10.7+ OS, even after changing hard disks and RAM.  For users who don't use symbolic links much, maybe this filing system/O.S. bug doesn't affect them as much.

    How would you know? Perhaps the other text files getting corrupted are your Portuguese localization files.

     

    Perhaps you should review where you are buying those hard drives and what other software you have installed.

     

    Malarky.  It is a bug in the O.S. or HFS+ new to 10.7+.  Facts supercede faith.

    That is an extraordinary claim and extraordinary claims need extraordinary evidence. You don't have any evidence. A handful of anonymous internet postings don't count.

  • Iosepho Level 1 Level 1 (0 points)

    "or the data gets overwritten"

     

    My point exactly. It obviously gets overwritten. By what, well that's what we want to know.

     

    As for millions of users not experiencing this problem, it seems to be rather transient. I am thinking concurrency bug. Also, OSX and end user software simply does not use symlinks much, so I can imagine systems where the user simply doesn't notice the error.

     

    "How would you know? Perhaps the other text files getting corrupted are your Portuguese localization files."

     

    Really. Come now. As if random data corruption would target text files specifically. A random corruption of this magnitude would make the OS unbootable in no time.

    I once had random corruption on a Windows box, and it didn't last 5 hours before becoming completely bricked. (Thankfully it was an office computer, haha.)

  • etresoft Level 7 Level 7 (26,865 points)

    Iosepho wrote:

     

    OSX and end user software simply does not use symlinks much, so I can imagine systems where the user simply doesn't notice the error.

    OS X makes extensive use of symbolic links. There are tens of thousands of them at least.

     

    Really. Come now. As if random data corruption would target text files specifically. A random corruption of this magnitude would make the OS unbootable in no time.

     

    There is certainly nothing that would specifically target text files or symbolic links. That was just an example to show how you could have random corruption go on for some time and never notice it. Most likely, it would be one particular area of the disk that is bad. That could manifest itself as small files created at about the same time experiencing the corruption and then no other files afterwards - because those get created on good areas of the disk. If the corruption is happening near the boot sector, then that could make the drive unbootable. If the corruption is happening on previously unused sections of a hard drive, then a new OS install that writes data to those unused areas of the disk for the first time might cause you to notice.

     

    I read back through some of the posts in this thread. There is everything from unknown RAID devices to black and white disk failures. It just seems that anyone who has noticed disk corruption through bad symbolic links (whether they admit that or not) is posting in this thread.

  • Jmanis Level 1 Level 1 (10 points)

    There's not been a single report of hard drive failure on this thread. The previous post that may have given you the impression was about a SMARTreporter error for IO caused by the users mail app corrupting its links. From the SMARTreporter website "SMARTReporter does not only check for S.M.A.R.T. disk failure predictions but increases failure prediction accuracy by checking for dangerous I/O errors as well."  So this app reports smart errors and watches the system logs for any app reporting an I/O error.

     

    I post this only for clarity for those that find this thread and are not misled into thinking a root cause has been found.

     

    Let's just all agree to move past the failed hard drive scenario. Anyone finding this forum should check their disk for SMART errors and if detected replace. Everyone else should keep reading.

     

    J

  • Smeevil Level 1 Level 1 (0 points)

    Since our last post on Sep 29, 2012 i just wanted to let you all know that the corruption has stopped completely for us. We did want to wait before confirming this until we have ran a substantial amount of tests and passed a bit of time to be sure. In our opinion the problem we had was that our ( boot disk / the disk mountain lion installed) was part of our hardware RAID10. Since we attached the FireWire 800 disk to the server and used that to as boot disk / install and the RAID as a data partition we've lost all traces of corruption. Not saying this is the root cause at all but just advicing that if you have a simular setup, you could try this

     

    If any of you have other working solutions, lets try to summerize them on what worked for you !

    Maybe we can then make a collective effort to get this to the attention of Apple.

     

    Summary :

     

    - Mac Pro 5.1 - 32GB

    - Mountain Lion 10.8.X

    - 4 X 2TB disks configured in a RAID 0 + 1

     

    Corruption stopped after adding an external 2TB FireWire800 disk used as boot / main disk and using the RAID10 as data disk.

  • Iosepho Level 1 Level 1 (0 points)

    Okay, since Ertesoft seems to be completely hung on the corrupt hard drive idea, all evidence aside, let's rule it out once and for all.

    I remember that on MS-DOS, there was a software called Norton Disk Doctor, that scoured your hard drive, sector by sector, and checked for errors. Without the need to erase the data on the disk.

    Is such an application available these days? I beleive it would need to boot on its own, to have the machine to itself.

     

    (BTW, the operating system would be made unbootable by any corruption of its main binaries or resource files, not just the boot sector.)

  • twtwtw Level 5 Level 5 (4,900 points)

    DiskWarrior sense to be the current favorite for drive checking and repair.  It's on the expensive side, but it can find and correct issues that Disk Utility misses entirely.

     

    Smeevi's post leads me to two separate thoughts:

    1. Driver/firmware issue.  Not all disks are created equal with respect to raids.  out-of-date drivers or firmware that may not cause problems in standard usage might not be up to the peculiar requirements of running a virtual disk.  This would be particularly true if you've just upgraded the OS; it's been my experience that disks often get squirrely after a new system is installed.
    2. From what I understand, it's a bad idea to run an operating system from a raid. I assume that's because the OS does a lot of disk access, and isn't expecting its data to get written out across multiple physical disks: disk access routines optimized for speed might not play will with distributed data.  Is anyone having this problem not running their OS from the raid?

     

    I'll add as an aside that when I see someone with five points to their name telling off someone with almost twenty thousand I can't help but roll my eyes.  I'm not saying that etresoft's assessment is the correct answer, of course, but I'm d@mned certain it's a good answer that's worth some careful thought.  Whatever one might think about the silly points system here, dismissing someone who's gotten twenty thousand of them - five/ten points a shot - is a bad idea.

  • etresoft Level 7 Level 7 (26,865 points)

    Iosepho wrote:

     

    Okay, since Ertesoft seems to be completely hung on the corrupt hard drive idea, all evidence aside, let's rule it out once and for all.

     

    It is not that I'm hung on on the failed (not corrupt - failed) hard drive idea. It is that other people are hung up on the OS bug idea. A failed hard drive is the most likely cause for this scenario. A flaky RAID controller is also a possibility, but far fewer people have RAID controllers. Surely no one here would ever want to focus soley on the specific issues and conditions reported by the original poster

     

    I remember that on MS-DOS, there was a software called Norton Disk Doctor, that scoured your hard drive, sector by sector, and checked for errors. Without the need to erase the data on the disk.

    Is such an application available these days? I beleive it would need to boot on its own, to have the machine to itself.

     

    Modern hard drives are far removed from the old MFM and RLL drives of yore.

     

    (BTW, the operating system would be made unbootable by any corruption of its main binaries or resource files, not just the boot sector.)

     

    That is true, but a boot sector failure is guaranteed death. Failure in the main binaries is likely death. Failure in one particular sector that just happens to have resource files or links is not a guaranteed death. Don't forget that only those people who have had the last type of failure are the only ones who can boot, run, and report strange things. Those whose hard drives have failed in other areas causing an inability to boot would likely not notice corrupt links.

  • Iosepho Level 1 Level 1 (0 points)

    "From what I understand, it's a bad idea to run an operating system from a raid."

     

    I beleive that statement to be extremely untrue. At least in the world of servers, running an OS from a raid is the norm.

  • twtwtw Level 5 Level 5 (4,900 points)

    Well, norm or not, it still bears investigation.  Are you running your OS from your raid? 

     

    And please, things are either true or untrue.  'Extremely' serves no purpose in that sentence except to express an emotion, and we could do with a little less of that in this thread.

  • Iosepho Level 1 Level 1 (0 points)

    "A failed hard drive is the most likely cause for this scenario. A flaky RAID controller is also a possibility, but far fewer people have RAID controllers."

     

    Well the problem I noticed happened once, on a brand new hard drive, after lugging a huge amount of data over from a save on Firewire. A good number of the symlinks of the Java installation done the day before got corrupted.

     

    This is weird for many reasons, notably that nothing but the Java symlinks seem to have been affected, and that not Java, nor its symlinks have been accessed around that time. My first thought was HDD failure as well, but the SMART tool does not report errors. I'd be the happiest if it did, because I have a year of warranty on the bloody thing, but no, it's apparently perfect.

     

    So if it IS a hard drive failure, I'd like a way to prove it. Not just because taking an iMac apart is not something I can do at home, the service costs money and I don't own a car, so lugging it over is a chore, but because if I can prove that the drive is defective, I can get a replacement, otherwise I'm essentially ******. (sorry)

     

    I haven't experienced any sort of breakage since then. Note though that I revved up the HDD cooler to max RPM by default with a handy util, as it was running over 50 degrees Celsius under heavy load. I mailed the service guy about this who installed the drive, and he said it's perfectly normal temperature for an iMac. I wonder if 55-56 Celsius could result in such corruption in the HDD?

  • ktwalker69 Level 1 Level 1 (0 points)

    twtwtw, i know you weren't asking me, but i was not running my OS from a RAID partitiion, but a partition did exist on my computer for data storage.  And the corrupted links were on my OS partition pointing to the RAID partition.

  • etresoft Level 7 Level 7 (26,865 points)

     

    Iosepho wrote:

     

    I beleive that statement to be extremely untrue. At least in the world of servers, running an OS from a raid is the norm.

     

    There's RAIDs and then there's RAIDs. It is normal to run a server from a RAID controller. It would be unusual to run a server from a software RAID. People running servers from a hardware RAID controller who noticed something like corrupt symbolic links probably wouldn't post a question here on Apple Support Communities. They would call Apple/Oracle/Dell/IBM and say "get down here and fix it".

     

    Well the problem I noticed happened once, on a brand new hard drive, after lugging a huge amount of data over from a save on Firewire. A good number of the symlinks of the Java installation done the day before got corrupted.

     

    The age of a hard drive is not significant. They can fail at any time. Some people have suggested initializing a hard drive with zeros as a way to map out the bad blocks. I don't know if that is necessary or not. I would expect things like SMART to automatically handle such things. I don't know the details of what you were copying from where. I do know that SMART is not enabled on external drives. Perhaps that is the cause. Perhaps performing a zero initialze of external drives is a good idea. I will have to investigate. That could be important.

     

    This is weird for many reasons, notably that nothing but the Java symlinks seem to have been affected, and that not Java, nor its symlinks have been accessed around that time. My first thought was HDD failure as well, but the SMART tool does not report errors. I'd be the happiest if it did, because I have a year of warranty on the bloody thing, but no, it's apparently perfect.

     

    So if it IS a hard drive failure, I'd like a way to prove it. Not just because taking an iMac apart is not something I can do at home, the service costs money and I don't own a car, so lugging it over is a chore, but because if I can prove that the drive is defective, I can get a replacement, otherwise I'm essentially ******. (sorry)

    iMac? Are you impacted by the iMac Repair program?

     

    Regardless, SMART is not a definitive proof of either failure or health. If SMART says the drive is dead, then it is. There are other indicators that could indicate failure as well, with random data corruption being right at the top of the list.

     

    I haven't experienced any sort of breakage since then. Note though that I revved up the HDD cooler to max RPM by default with a handy util, as it was running over 50 degrees Celsius under heavy load. I mailed the service guy about this who installed the drive, and he said it's perfectly normal temperature for an iMac. I wonder if 55-56 Celsius could result in such corruption in the HDD?

     

    I'm skeptical of "handy utils". I prefer to trust Apple's default settings. I challenge Apple with bug reports just as I challenge people here on the forums. Apple engineers have a very good track record of disproving my claims so I have come to put a lot of faith in their judgement, even when it contradicts my own.

  • Iosepho Level 1 Level 1 (0 points)

    Nah, I guess I would have qualified, if I had the original retail papers of the Mac. I bought it second hand, and a good two years in the Seagate started to fail. But it's not the one I had the trouble with. It hanged for 5-10 seconds and made clicking noises, but it never lost data on me. The new HDD isn't Apple certified, it's a WD Green I had a third party service install (ie. they are a non-Apple service, specializing in Apple devices past their official warranty... normally they are awesome, they can clean and seal a cloudy iMac panel for like $100 instead of replacing it - what an Apple certified service would suggest).

     

    Oh and needless to say the drive I had trouble with is not external, it's internal.

     

    So anyway, any idea for finding out if the HDD is okay or not? Initializing it with zeros from a rescue disk? Anything else?

Previous 1 3 4 5 6 7 Next