2 Replies Latest reply: Jan 27, 2013 3:01 PM by SchwarzerPeter
SchwarzerPeter Level 1 Level 1 (0 points)

Hi all,

 

my Mac Pro (early 2009, quad core 2.93GHz, NVIDIA GeForce GTX 285) has experienced some kernel panics with automated restarts (8 in total) over the past couple weeks. All panic logs show hardware errors of the "Read ECC" kind, though the affected cores vary (3x CPU 0, 2x CPU 2, 1x each CPU 4, 6 and 7). I am hoping that it's only one of the DIMMs gone bad (I was thinking about upgrading the RAM anyway), but I fear it might be the mem controller, one of the caches or the CPU itself.

 

There seem to be quite a few knowlegable people around, so can you please have a look at the panic log and give me a hint towards the culprit? Would be much appreciated.

 

Best regards, Armin

 

----------8<----------

 

Panic (system crashes) log:

 

  Source:          /Library/Logs/DiagnosticReports/Kernel_2013-01-20-163939_Kronos.panic

  Size:          9 KB (8.803 bytes)

  Last Modified:          20.01.13 16:39

  Recent Contents:          Sun Jan 20 16:39:39 2013

Machine-check capabilities 0x0000000000001c09:

family: 6 model: 26 stepping: 5 microcode: 17

Intel(R) Xeon(R) CPU           W3540  @ 2.93GHz

9 error-reporting banks

threshold-based error status present

extended corrected memory error handling present

Processor 0: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  MCA error code:            0x009f

  Model specific error code: 0x0001

  Other information:         0x00000000

  Threshold-based status:    Undefined

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 1: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  MCA error code:            0x009f

  Model specific error code: 0x0001

  Other information:         0x00000000

  Threshold-based status:    Undefined

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 2: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  MCA error code:            0x009f

  Model specific error code: 0x0001

  Other information:         0x00000000

  Threshold-based status:    Undefined

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 3: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  MCA error code:            0x009f

  Model specific error code: 0x0001

  Other information:         0x00000000

  Threshold-based status:    Undefined

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 4: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

Package 0 logged:

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  Channel number:         15 (unknown)

  Memory Operation:       read

  Machine-specific error: Read ECC

  COR_ERR_CNT:            0

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

  RTID:     136

  DIMM:     0

  Channel:  1

  Syndrome: 0xd16986de

Processor 5: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  MCA error code:            0x009f

  Model specific error code: 0x0001

  Other information:         0x00000000

  Threshold-based status:    Undefined

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 6: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  MCA error code:            0x009f

  Model specific error code: 0x0001

  Other information:         0x00000000

  Threshold-based status:    Undefined

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

Processor 7: machine-check status 0x0000000000000004:

machine-check in progress

MCA error-reporting registers:

IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid

IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid

IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid

IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid

IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid

IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid

IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid

IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid

IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid

  MCA error code:            0x009f

  Model specific error code: 0x0001

  Other information:         0x00000000

  Threshold-based status:    Undefined

  Status bits:

   Processor context corrupt

   ADDR register valid

   MISC register valid

   Error enabled

   Uncorrected error

IA32_MC8_ADDR(0x422): 0x00000001375d3f80

IA32_MC8_MISC(0x423): 0xd16986de00041388

panic(cpu 5 caller 0xffffff80034b83c9): "Machine Check at 0xffffff7f85320e87, registers:\n" "CR0: 0x000000008001003b, CR2: 0x00007fdc16020000, CR3: 0x0000000005d01000, CR4: 0x0000000000000660\n" "RAX: 0x0000000000000020, RBX: 0xffffff801444b800, RCX: 0x0000000000000001, RDX: 0x0000000000000000\n" "RSP: 0xffffff80dac5bd70, RBP: 0xffffff80dac5bda0, RSI: 0x0000000000000006, RDI: 0x0000000000000006\n" "R8:  0x0000000000000000, R9:  0x7ffffffffffffffe, R10: 0xffffff8015e72b28, R11: 0x0000000000000246\n" "R12: 0x0000000000000006, R13: 0xffffff80141ef540, R14: 0x0000000000000000, R15: 0x00000000000007b0\n" "RFL: 0x0000000000000046, RIP: 0xffffff7f85320e87, CS:  0x0000000000000008, SS:  0x0000000000000010\n" "Error code: 0x0000000000000000\n"@/SourceCache/xnu/xnu-2050.18.24/osfmk/i386/trap_native.c: 280

Backtrace (CPU 5), Frame : Return Address

0xffffff80d08e5ec0 : 0xffffff800341d626

0xffffff80d08e5f30 : 0xffffff80034b


Mac Pro, OS X Mountain Lion (10.8.2)
  • Grant Bennet-Alder Level 9 Level 9 (51,915 points)

    The Mac Pro's Xeon Processors feature Hardware Error Correction, and use Error Correcting Code memory.

     

    Eight additional check bits (called Syndrome bits) are stored with each word in RAM memory. When Read out, the data and the syndrome are used together to detect and correct errors.

     

    Single-bit errors are corrected on-the-fly with essentially no slowdown is processing speed.

     

    Uncorrectable errors such as most double-bit errors cause a Kernel panic by design, to halt the machine and keep errors from propagating into your data.

     

    Your kernel panic does indeed show the characteristics of an uncorrectable RAM memory error. The likelihood that the problem is caused by any of the other failures you listed is vanishingly small.

     

     

    Channel number:         15 (unknown)

      Memory Operation:       read

      Machine-specific error: Read ECC

      COR_ERR_CNT:            0

      Status bits:

       Processor context corrupt

       ADDR register valid

       MISC register valid

       Error enabled

       Uncorrected error

    IA32_MC8_ADDR(0x422): 0x00000001375d3f80

    IA32_MC8_MISC(0x423): 0xd16986de00041388

      RTID:     136

      DIMM:     0

      Channel:  1

      Syndrome: 0xd16986de

     

  • SchwarzerPeter Level 1 Level 1 (0 points)

    Thanks for the advice. I got a kit with 3 DIMMs from Kensington and installed it (had the Memory Slot Utility popup problem after installation, but found the valid discussion on here to solve that). The machine was running for 12 hours nonstop after installation without any KPs, so I'm cautionously optimistic that the problem is solved.

     

    Armin