For three months, I’ve been in the same boat as many of you—2012 Retina Macbook Pro with increasing GPU panics after installing Yosemite—but I’ve held back posting until I could provide the full results of my testing. Details below, but first my high level conclusions.
This problem is due to a progressive hardware failure of the Nvidia GPU or its interface to the logic board. Given that the problem appears to affect only some MacBookPro10,1 laptops, and has been also reported on 2013 and 2014 MacBook Pros, I think it’s likely that the root cause is a manufacturing defect with the Nvidia GPU or the MacBook Pro itself. But I think it’s also very possible that the issue only manifests itself and gets progressively worse when the MacBook Pro exceeds some thermal limit.
While I know it's tempting to blame this problem on Yosemite given the time frame—I was there myself—from my evidence it’s very clear this is not a problem with Yosemite. I rolled back to Mavericks and the problem persisted. With my new logic board, Yosemite is running fine.
I should also mention that I was able to ultimately fix the problem by replacing the logic board. Like many people here, an Apple Store rejected my unit for warranty repair claiming liquid contact, despite the fact that I know this has never happened. I spent $50 to have an Independent Apple Service Provider clean out dust and replace the thermal paste (per vadeskoc, although this did not fix the problem), and in the process gathered evidence that refuted the Apple Store’s claim. With the photographic evidence I provided (http://imgur.com/a/cHDOS), I went back to the Apple Store and they replaced the logic board, both fans, and heatsink under warranty.
- I bought the MacBook Pro in Jan 2013. Just a year later, while still running Mavericks, I experienced what I believe was likely the same issue. Apple fixed it by replacing the logic board and heatsink. Here’s an excerpt from the problem description filed by the Apple Genius: “Customer reports that during use screen will go black. He will have to do a hard shut down … Customer had a error report that showed GPU panics”
- After this, my MacBook Pro worked flawlessly until mid-Nov 2014, a month after installing Yosemite. Safari rendering was wonky, and iPhoto would occasionally bomb with GPU panics. By mid-Dec 2014, this had progressed to frequent GPU panics in other apps, and after each GPU panic I’d have to wait 5-10 minutes before the laptop would boot properly again.
- I also determined that I could consistently reproduce the problem within 5 minutes by running the “furry donut” stress test in the popular “GPU Test” app. And I determined that I could prevent the problem completely by using gfxCardStatus to only use the integrated Intel GPU. In other words, I don’t believe the core issue is caused by switching between GPUs, although that could be a second, related failure mode.
- Using Mac Fan Control, I determined that manually boosting my fan speeds to max did not fix the problem. I did notice one anomaly that I think may be important: the temperature shown by the “GPU diode” was all over the place, especially as I started running the Nvidia GPU: I would see fluctuations up and down of 15°C in a single second. In contrast, on my newly fixed logic board, I don’t see any such fluctuations: when I run the stress test, it starts down around 40°C and then rises steadily to 90°C, at which point the fans are on full and bring it down to a steady 80°C.
- Like many of you, I strongly suspected this might be a software issue, so I rolled back and did a clean install of Mavericks (completely clean, just built-in apps and no custom data or additions). The problem persisted. If this is a software issue as many are still claiming, it also exists in Mavericks. If you don't believe, I recommend you try the same.
- I also suspected that the problem could be caused by the new EFI boot ROM surreptitiously installed with Yosemite (MBP101.00EE.B05). Apple’s EFI boot ROM page shows that the MacBookPro10,1 should be using MBP101.00EE.B02. So I rolled back to this boot ROM. The problem persisted. (I reported to Apple three months ago not only that this page was out of date, but that the B03, B05, and (now) B07 boot ROM updates for the MacBookPro10,1 are not available online anywhere, and they still haven't addressed this)
- Having exhausted likely software options, I attempted to eliminate thermal issues as the culprit by implementing the vadeskoc fix: using an IASP (so the MacBook Pro remained under warranty), I had them clean out the dust and replace the thermal paste over the GPU and CPU. The Mac ran much cooler, but the problem persisted. I agree that the thermal paste application—undoubtedly performed during the 1/2014 repair—looked sloppy; in fact, what Apple was claiming as liquid contact appeared to be thermal paste.
- When Apple returned my repaired laptop with new logic board, it was still running my Mavericks install, although the new logic board had the B03 boot ROM. First thing, I set gfxCardStatus to run the Nvidia-only and then ran the GPU stress test for an hour, no failures. I then upgraded to Yosemite, which also updated the boot ROM to the new B07 version, and repeated this test; no failures.
The fact that the GPU diode temperature had apparently failed on my unit is very suspicious, especially since it didn’t show up in Apple’s tests. While this could be coincidence or a symptom, what I’m wondering now is whether it could be a primary or secondary cause of the problem. If this temperature sensor was failing, as it appears, it’s possible that the GPU is getting too hot, either damaging the GPU itself or worsening a manufacturing defect in the GPU interface. If so, this problem would be exacerbated by everything reported: dust in the case, bad thermal paste job, and increased Nvidia GPU utilization in Yosemite.
I also wonder why Apple is letting the Nvidia GPU get all the way to 90°C on my repaired unit before bringing it down to 80°C; while this situation only lasts for less than a minute, it seems like they should be cranking the fans up higher faster to avoid thermally stressing this area, give the problems many of us are having.
I also wonder if the fact that a number of us shelled out $3300 for this laptop is relevant, since it means we bought custom units (16 GB RAM, 768 GB SSD, fast processor), which could mean we all have logic boards (and even replacements) from a particular, relatively small batch that could have had defects not seen in the standard SKUs. And, of course, it also means our laptops run hotter.
Finally, because I don’t think Apple has resolved this issue yet—although I suspect they are aware of it—I suspect I may be back in the same situation a year or less from now. What I’m going to do to try to prevent this is keep an eye on the GPU diode temperature sensor to see if it starts failing. I’m also going to buy a pentalobe screwdriver so I can open the bottom case up every month and blow out the dust.