Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

How to Determine which DIMM to Replace Given Kernel Panic Report

I have a dual cpu, mid-2010 Mac Pro with 32GB of Kingston RAM (8 4gb modules). Once a week or so, it suffers a kernel panic that appears to be caused by a RAM problem. Here is a typical one:


Mon Jul 18 10:01:26 2011

Machine-check capabilities (cpu 17) 0x0000000000001c09:

family: 6 model: 44 stepping: 2 microcode: 15

Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

9 error-reporting banks

threshold-based error status present

extended corrected memory error handling present

Machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

Package 1 logged:

IA32_MC8_STATUS(0x421): 0xfe0000800001009f valid

Channel number: 15 (unknown)

Memory Operation: read

Machine-specific error: Read ECC

COR_ERR_CNT: 4

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

Error overflow

IA32_MC8_ADDR(0x422): 0x0000000408aa4340

IA32_MC8_MISC(0x423): 0x29d7aef000040383

DIMM: 0

Channel: 1

Syndrome: 0x29d7aef0

Package 0 logged:

IA32_MC8_STATUS(0x421): 0x0000000000000000 invalid


They all point to DIMM 0, Channel 1, and indicate addresses like 0x00000004xxxxxxxx. My question is: If I am interested in replacing the suspect DIMM, which one is it?


In the User Guide, it indicates I have 8 memory slots, organized in two chunks, each chunk adjacent to a cpu. The diagram says that Slots 1-4 are adjacent to one CPU, and Slots 5-8 are adjacent to the other. I, of course, filled all of them myself. System Profiler tells me about DIMMs 1-8. The kernel panic report seems to tell me that DIMM 0 is to blame, perhaps DIMM 0 in Channel 1. Here is a representation of the slot layout in the User Guide:


slot 5 CPU

slot 6 slot 4

slot 7 slot 3

slot 8 slot 2

CPU slot 1

latch latch



where the bottom of the diagram is toward the side of the tray that has the latches that release the tray. The form of answer I'd like to see is something like: Replace the DIMM in Slot 1. Or perhaps: Replace the DIMMs in slot 1 and slot 5.

Mac Pro, Mac OS X (10.6.8)

Posted on Jul 19, 2011 10:45 AM

Reply
9 replies

Jul 19, 2011 1:06 PM in response to Richard Gabriel

When it kernel panics, do not shut it off. Remove the side door while it is still running, and note which DIMM has the error LED lit. The door is not interlocked, and there are no hazardous voltages accessible to your unaided fingers, but components may be hot.


Alternatively, Apple System Profiler will indicate (with Status other than "OK") which module(s) are giving single bit errors after it has been running for a while.

Jul 19, 2011 1:42 PM in response to The hatter

I have run AHT a number of times in Extended Loop. Only once did it report an error, which was this:


4MEM/66/40000000: 0x6f798818


The fellow at the Apple Genius Bar was unable to decode the message - only that it indicated a memory-related error.


Memtest has never failed.


I take from the notes from Grant Bennet-Alder and The hatter that it is not possible from the information provided in the kernel panic log to determine which memory DIMM is at fault. Is this correct?

Jul 19, 2011 4:30 PM in response to The hatter

Perhaps I wasn't clear. The Genius *did* look it up - it took him about 5 actual minutes - and he said that all he could tell was that it was a memory error of some sort. I was asking him specifically what the 4 and 66 meant as well as the hex number. And with a ton of google I was unable find much about it.


I know quite a lot about how to eventually find the bad DIMM, if there is one. The machine in question is used heavily for Photoshop, InDesign, audio editing, some video editing, and programming, and if the memory failures (or memory controller errors) were more frequent, I would engage in classical isolation techniques - and I probably will have to. It simply seemed to me that the information I need is in the kernel panic logs.

Jul 19, 2011 4:34 PM in response to Grant Bennet-Alder

I didn't know about the error LED. I'll still have to do some isolation - though not much - because after opening the side door, I can see only one set of the memory modules - the other four are on the backside of a CPU which blocks the view. Perhaps a dentist's mirror would work. To get to the other four I need to slide out the CPU tray, which interrupts power.


If it's on the side I can see, then I know what to replace on this side. If not, I can swap the four on the frontside with the four on the backside and wait for another failure.

How to Determine which DIMM to Replace Given Kernel Panic Report

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.