ashvartsman

Q: Mavericks and Failed ARP causing network drops!

I have been wracking my brain about why on our corporate network, after Mavericks upgrade, we start to see dropped packets every 30-60 seconds.  Here is an example of that ping.

 

64 bytes from 10.11.12.13: icmp_seq=135 ttl=63 time=3.705 ms

64 bytes from 10.11.12.13: icmp_seq=136 ttl=63 time=3.473 ms

64 bytes from 10.11.12.13: icmp_seq=137 ttl=63 time=3.811 ms

64 bytes from 10.11.12.13: icmp_seq=138 ttl=63 time=4.110 ms

Request timeout for icmp_seq 139

Request timeout for icmp_seq 140

Request timeout for icmp_seq 141

Request timeout for icmp_seq 142

Request timeout for icmp_seq 143

64 bytes from 10.11.12.13: icmp_seq=144 ttl=63 time=5.417 ms

64 bytes from 10.11.12.13: icmp_seq=145 ttl=63 time=3.587 ms

64 bytes from 10.11.12.13: icmp_seq=146 ttl=63 time=3.744 ms

64 bytes from 10.11.12.13: icmp_seq=147 ttl=63 time=3.486 ms

64 bytes from 10.11.12.13: icmp_seq=148 ttl=63 time=3.466 ms

 

 

I think I have found a strange ARPing issue which is causing it.  In our corporate environment, we run GLBP (Gateway load balancing protocol) on Cisco gear.  As such, the gateway address floats between two devices requiring the mac_addr to change.  Looks something like this in the arp table:

 

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:2 on en0 ifscope [ethernet]

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:1 on en0 ifscope [ethernet]

 

On my mountain lion machine, it does a broadcast arp and gets a response for the new mac_addr immediately. 

 

25826.783206000Apple_78:29:ddBroadcastARP42Who has 10.224.165.1?  Tell 10.224.165.55
25926.786929000Cisco_e0:ff:40Apple_78:29:ddARP6010.224.165.1 is at 00:07:b4:02:cb:01

 

This happens seemlessly in the background and no packet loss is observed.  However, looks like Mavericks is doing something completely different, and WRONG.  It is sending out 5 UNICAST requests back to the mac address it had before (ARP should always be broadcast!!!).  It fails these 5 times and then finally does a BROADCAST attempt.  Looks like the below.  It causes then about a 5 second outage to the network of the machine.

 

394          67.052366000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

395          68.053450000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

396          69.053595000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

397          70.053893000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

398          71.054363000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

399          72.054466000          Apple_b9:a6:b2          Broadcast          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

400          72.058079000          Cisco_e0:ff:40          Apple_b9:a6:b2          ARP          60          10.224.165.1 is at 00:07:b4:02:cb:01

 

 

Here is the arp table during this period:

 

macsccmtest:~ administrator$ arp -a

? (10.224.165.1) at (incomplete) on en1 ifscope [ethernet]

? (10.224.165.220) at f0:b4:79:21:4c:ec on en1 ifscope [ethernet]

 

 

My hunch is that Apple did this to try to reduce bandwidth utilization on the network but it will cause BIG problems on corporate networks that use GLBP or any other protocol to provide redundancy across multiple devices!

 

Anyone else seeing this?  Everyone in my office who has moved to Mavericks can replicate this behavior.

OS X Mavericks (10.9)

Posted on Oct 25, 2013 11:12 AM

Close

Q: Mavericks and Failed ARP causing network drops!

  • All replies
  • Helpful answers

Page 1 of 5 last Next Page 1 of 5 last Next