Mavericks and Failed ARP causing network drops!

Question

Level 1

0 points

Mavericks and Failed ARP causing network drops!

I have been wracking my brain about why on our corporate network, after Mavericks upgrade, we start to see dropped packets every 30-60 seconds. Here is an example of that ping.

64 bytes from 10.11.12.13: icmp_seq=135 ttl=63 time=3.705 ms

64 bytes from 10.11.12.13: icmp_seq=136 ttl=63 time=3.473 ms

64 bytes from 10.11.12.13: icmp_seq=137 ttl=63 time=3.811 ms

64 bytes from 10.11.12.13: icmp_seq=138 ttl=63 time=4.110 ms

Request timeout for icmp_seq 139

Request timeout for icmp_seq 140

Request timeout for icmp_seq 141

Request timeout for icmp_seq 142

Request timeout for icmp_seq 143

64 bytes from 10.11.12.13: icmp_seq=144 ttl=63 time=5.417 ms

64 bytes from 10.11.12.13: icmp_seq=145 ttl=63 time=3.587 ms

64 bytes from 10.11.12.13: icmp_seq=146 ttl=63 time=3.744 ms

64 bytes from 10.11.12.13: icmp_seq=147 ttl=63 time=3.486 ms

64 bytes from 10.11.12.13: icmp_seq=148 ttl=63 time=3.466 ms

I think I have found a strange ARPing issue which is causing it. In our corporate environment, we run GLBP (Gateway load balancing protocol) on Cisco gear. As such, the gateway address floats between two devices requiring the mac_addr to change. Looks something like this in the arp table:

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:2 on en0 ifscope [ethernet]

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:1 on en0 ifscope [ethernet]

On my mountain lion machine, it does a broadcast arp and gets a response for the new mac_addr immediately.

258

26.783206000

Apple_78:29:dd

Broadcast

ARP

42

Who has 10.224.165.1? Tell 10.224.165.55

259

26.786929000

Cisco_e0:ff:40

Apple_78:29:dd

ARP

60

10.224.165.1 is at 00:07:b4:02:cb:01

This happens seemlessly in the background and no packet loss is observed. However, looks like Mavericks is doing something completely different, and WRONG. It is sending out 5 UNICAST requests back to the mac address it had before (ARP should always be broadcast!!!). It fails these 5 times and then finally does a BROADCAST attempt. Looks like the below. It causes then about a 5 second outage to the network of the machine.

394 67.052366000 Apple_b9:a6:b2 Cisco_02:cb:02 ARP 42 Who has 10.224.165.1? Tell 10.224.165.225

395 68.053450000 Apple_b9:a6:b2 Cisco_02:cb:02 ARP 42 Who has 10.224.165.1? Tell 10.224.165.225

396 69.053595000 Apple_b9:a6:b2 Cisco_02:cb:02 ARP 42 Who has 10.224.165.1? Tell 10.224.165.225

397 70.053893000 Apple_b9:a6:b2 Cisco_02:cb:02 ARP 42 Who has 10.224.165.1? Tell 10.224.165.225

398 71.054363000 Apple_b9:a6:b2 Cisco_02:cb:02 ARP 42 Who has 10.224.165.1? Tell 10.224.165.225

399 72.054466000 Apple_b9:a6:b2 Broadcast ARP 42 Who has 10.224.165.1? Tell 10.224.165.225

400 72.058079000 Cisco_e0:ff:40 Apple_b9:a6:b2 ARP 60 10.224.165.1 is at 00:07:b4:02:cb:01

Here is the arp table during this period:

macsccmtest:~ administrator$ arp -a

? (10.224.165.1) at (incomplete) on en1 ifscope [ethernet]

? (10.224.165.220) at f0:b4:79:21:4c:ec on en1 ifscope [ethernet]

My hunch is that Apple did this to try to reduce bandwidth utilization on the network but it will cause BIG problems on corporate networks that use GLBP or any other protocol to provide redundancy across multiple devices!

Anyone else seeing this? Everyone in my office who has moved to Mavericks can replicate this behavior.

OS X Mavericks (10.9)

Posted on Oct 25, 2013 11:12 AM

Reply

Answer 1

Top-ranking reply

Lunaweb

Level 1

10 points

Oct 26, 2013 1:45 AM in response to ashvartsman

Same problem here. I spent a lot of time in identifying this problem.

Finally, the solution for this:

$ sudo sysctl -w net.link.ether.inet.arp_unicast_lim=0

net.link.ether.inet.arp_unicast_lim: 5 -> 0

(you can set this also in /etc/sysctl.conf)

This disables the unicast ARP requests.

Reply

Answer 2

ashvartsman Author

Level 1

0 points

Oct 29, 2013 11:47 AM in response to Lunaweb

Will editing the /etc/sysctl.conf survive a reboot. Just setting the variable does not.

Reply

Answer 3

ashvartsman Author

Level 1

0 points

Oct 29, 2013 1:09 PM in response to ashvartsman

If you are missing the /etc/sysctl.conf file, just create it per below:

The file rights must be set to: owner root:wheel and -rw-r--r-- please make sure by typing ls -al /etc/sysctl.conf on the command line and if not change it by running chown root:wheel /etc/sysctl.conf and chmod 0644 /etc/sysctl.conf from the command line.

Then enter the variable changes for the fix.

Reply

Answer 4

Oct 29, 2013 4:57 PM in response to Lunaweb

Thats awesome - thanks!

Here's a script that automates that:

https://github.com/MacMiniVault/Mac-Scripts/blob/master/unicastarp/unicastarp-RE ADME.md

Reply

Answer 5

Oct 30, 2013 5:33 PM in response to jonschwenn

Here's a script that fixes the issue temporarily (doesn't survive a reboot). It simply automates what LunaWeb was suggesting. This method was more preferrable for our use-case because we think that Apple will resolve this issue with 10.9.1 (or another patch), and didn't want to much with writing out a new /etc/sysctl.conf file.

https://gist.github.com/l1m5/7242676

Reply

Answer 6

thmphams

Level 1

0 points

Oct 31, 2013 8:55 AM in response to ashvartsman

Thanks for an exelence analysis. We have exactly the same issue in which we also use GLBP in our cores switches. For sites that don't use GLBP, no issue. Applied the fix from Lunaweb, it fixed the issue so far. It worked even after reboot. You need to make sure that after created the file, proper permissions should be applied to the file as ashvartsman mentioned.

Reply

Answer 7

thmphams

Level 1

0 points

Nov 5, 2013 1:45 PM in response to ashvartsman

Not sure of the people who desinged Mavericks have Enterprise network in mind but there are problem with GLBP for sure. Now I just discovered that we have problem with our CiscoAnyconnect VPN also. The connection would drop EVERY minute when on VPN. Outlook for Mac also has problem...time to go back to the big cat...Mavericks is NO good for Enterprise network. I have already have enough problems with Mavericks so far...I can't work on my MacBook Pro (late 2012 version) when connection kepps dropping....

Reply

Answer 8

ashvartsman Author

Level 1

0 points

Nov 5, 2013 1:52 PM in response to thmphams

This should help you for now with the AnyConnect:

https://supportforums.cisco.com/thread/2247235

Look for the post about the new version Cisco came out with to patch the issue.

Reply

Answer 9

thmphams

Level 1

0 points

Nov 6, 2013 3:47 PM in response to ashvartsman

Hi ashvartsman, thanks for the link. I've got Cisco AnyConnect version 3.1.04074 installed. It's looking good so far as I do not see dropped connection on the Mavericks anymore. I need to test more on another machine to see if it is actually stable.

Reply

Answer 10

Aiap

Level 1

0 points

Nov 18, 2013 11:47 AM in response to jonschwenn

Hey if I ran the script how could I reset it to default if I wanted to?

Thanks

Reply

Answer 11

Nov 18, 2013 1:23 PM in response to Aiap

You can 'cat /etc/sysctl.conf' to see if there is any more than the one line for "net.link.ether.inet.arp_unicast_lim=0"...

If that is the only line, then you can remove the file. Otherwise you'd just want to remove that one line. A reboot would make the change effective

Reply

Answer 12

Nov 21, 2013 1:53 AM in response to ashvartsman

Thank you for diagnosing and solving this. I have this problem on my residential connection and it's been driving me nuts ever since the Mavericks upgrade. I've filed a bug at https://bugreport.apple.com (I suggest you do the same) and let's hope they do something about it.

Reply

Answer 13

Dec 17, 2013 9:18 AM in response to ashvartsman

JFYI. The issue still remains in 10.9.1.

Reply

Answer 14

matdavis

Level 1

0 points

Dec 17, 2013 9:24 PM in response to Akira Okumura

Agreed. This is to my router.

--- 172.31.1.1 ping statistics ---

2286 packets transmitted, 1879 packets received, 17.8% packet loss

round-trip min/avg/max/stddev = 0.861/143.721/2768.362/383.409 ms

mini-wifi:~ matthewdavis$

mini-wifi:~ matthewdavis$ sudo sysctl -a | grep net.link.ether.inet.arp_unicast_lim

Password:

net.link.ether.inet.arp_unicast_lim: 0

This is with no pending software updates as of today.

Reply

Answer 15

Dec 18, 2013 7:20 AM in response to Akira Okumura

I can also confirm this has not been fixed in 10.9.1. Just keep updating the bug reports and wait I suppose.

Reply