Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Help deciphering Mac Pro kernel panic report

Hi all,


my Mac Pro (early 2009, quad core 2.93GHz, NVIDIA GeForce GTX 285) has experienced some kernel panics with automated restarts (8 in total) over the past couple weeks. All panic logs show hardware errors of the "Read ECC" kind, though the affected cores vary (3x CPU 0, 2x CPU 2, 1x each CPU 4, 6 and 7). I am hoping that it's only one of the DIMMs gone bad (I was thinking about upgrading the RAM anyway), but I fear it might be the mem controller, one of the caches or the CPU itself.


There seem to be quite a few knowlegable people around, so can you please have a look at the panic log and give me a hint towards the culprit? Would be much appreciated.


Best regards, Armin


----------8<----------


Panic (system crashes) log:


Source: /Library/Logs/DiagnosticReports/Kernel_2013-01-20-163939_Kronos.panic

Size: 9 KB (8.803 bytes)

Last Modified: 20.01.13 16:39

Recent Contents: Sun Jan 20 16:39:39 2013

Machine-check capabilities 0x0000000000001c09:

family: 6 model: 26 stepping: 5 microcode: 17

Intel(R) Xeon(R) CPU W3540 @ 2.93GHz

9 error-reporting banks

threshold-based error status present

extended corrected memory error handling present

Processor 0: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

MCA error code: 0x009f

Model specific error code: 0x0001

Other information: 0x00000000

Threshold-based status: Undefined

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 1: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

MCA error code: 0x009f

Model specific error code: 0x0001

Other information: 0x00000000

Threshold-based status: Undefined

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 2: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

MCA error code: 0x009f

Model specific error code: 0x0001

Other information: 0x00000000

Threshold-based status: Undefined

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 3: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

MCA error code: 0x009f

Model specific error code: 0x0001

Other information: 0x00000000

Threshold-based status: Undefined

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 4: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

Package 0 logged:

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

Channel number: 15 (unknown)

Memory Operation: read

Machine-specific error: Read ECC

COR_ERR_CNT: 0

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

RTID: 136

DIMM: 0

Channel: 1

Syndrome: 0xd16986de

Processor 5: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

MCA error code: 0x009f

Model specific error code: 0x0001

Other information: 0x00000000

Threshold-based status: Undefined

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 6: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

MCA error code: 0x009f

Model specific error code: 0x0001

Other information: 0x00000000

Threshold-based status: Undefined

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 7: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

MCA error code: 0x009f

Model specific error code: 0x0001

Other information: 0x00000000

Threshold-based status: Undefined

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

panic(cpu 5 caller 0xffffff80034b83c9): "Machine Check at 0xffffff7f85320e87, registers:\n" "CR0: 0x000000008001003b, CR2: 0x00007fdc16020000, CR3: 0x0000000005d01000, CR4: 0x0000000000000660\n" "RAX: 0x0000000000000020, RBX: 0xffffff801444b800, RCX: 0x0000000000000001, RDX: 0x0000000000000000\n" "RSP: 0xffffff80dac5bd70, RBP: 0xffffff80dac5bda0, RSI: 0x0000000000000006, RDI: 0x0000000000000006\n" "R8: 0x0000000000000000, R9: 0x7ffffffffffffffe, R10: 0xffffff8015e72b28, R11: 0x0000000000000246\n" "R12: 0x0000000000000006, R13: 0xffffff80141ef540, R14: 0x0000000000000000, R15: 0x00000000000007b0\n" "RFL: 0x0000000000000046, RIP: 0xffffff7f85320e87, CS: 0x0000000000000008, SS: 0x0000000000000010\n" "Error code: 0x0000000000000000\n"@/SourceCache/xnu/xnu-2050.18.24/osfmk/i386/trap_native.c: 280

Backtrace (CPU 5), Frame : Return Address

0xffffff80d08e5ec0 : 0xffffff800341d626

0xffffff80d08e5f30 : 0xffffff80034b

Mac Pro, OS X Mountain Lion (10.8.2)

Posted on Jan 20, 2013 8:40 AM

Reply
Question marked as Best reply

Posted on Jan 20, 2013 11:38 AM

The Mac Pro's Xeon Processors feature Hardware Error Correction, and use Error Correcting Code memory.


Eight additional check bits (called Syndrome bits) are stored with each word in RAM memory. When Read out, the data and the syndrome are used together to detect and correct errors.


Single-bit errors are corrected on-the-fly with essentially no slowdown is processing speed.


Uncorrectable errors such as most double-bit errors cause a Kernel panic by design, to halt the machine and keep errors from propagating into your data.


Your kernel panic does indeed show the characteristics of an uncorrectable RAM memory error. The likelihood that the problem is caused by any of the other failures you listed is vanishingly small.



Channel number: 15 (unknown)

Memory Operation: read

Machine-specific error: Read ECC

COR_ERR_CNT: 0

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

RTID: 136

DIMM: 0

Channel: 1

Syndrome: 0xd16986de


2 replies
Question marked as Best reply

Jan 20, 2013 11:38 AM in response to SchwarzerPeter

The Mac Pro's Xeon Processors feature Hardware Error Correction, and use Error Correcting Code memory.


Eight additional check bits (called Syndrome bits) are stored with each word in RAM memory. When Read out, the data and the syndrome are used together to detect and correct errors.


Single-bit errors are corrected on-the-fly with essentially no slowdown is processing speed.


Uncorrectable errors such as most double-bit errors cause a Kernel panic by design, to halt the machine and keep errors from propagating into your data.


Your kernel panic does indeed show the characteristics of an uncorrectable RAM memory error. The likelihood that the problem is caused by any of the other failures you listed is vanishingly small.



Channel number: 15 (unknown)

Memory Operation: read

Machine-specific error: Read ECC

COR_ERR_CNT: 0

Status bits:

Processor context corrupt

ADDR register valid

MISC register valid

Error enabled

Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

RTID: 136

DIMM: 0

Channel: 1

Syndrome: 0xd16986de


Jan 27, 2013 3:01 PM in response to Grant Bennet-Alder

Thanks for the advice. I got a kit with 3 DIMMs from Kensington and installed it (had the Memory Slot Utility popup problem after installation, but found the valid discussion on here to solve that). The machine was running for 12 hours nonstop after installation without any KPs, so I'm cautionously optimistic that the problem is solved.


Armin

Help deciphering Mac Pro kernel panic report

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.