ashvartsman

Q: Mavericks and Failed ARP causing network drops!

I have been wracking my brain about why on our corporate network, after Mavericks upgrade, we start to see dropped packets every 30-60 seconds.  Here is an example of that ping.

 

64 bytes from 10.11.12.13: icmp_seq=135 ttl=63 time=3.705 ms

64 bytes from 10.11.12.13: icmp_seq=136 ttl=63 time=3.473 ms

64 bytes from 10.11.12.13: icmp_seq=137 ttl=63 time=3.811 ms

64 bytes from 10.11.12.13: icmp_seq=138 ttl=63 time=4.110 ms

Request timeout for icmp_seq 139

Request timeout for icmp_seq 140

Request timeout for icmp_seq 141

Request timeout for icmp_seq 142

Request timeout for icmp_seq 143

64 bytes from 10.11.12.13: icmp_seq=144 ttl=63 time=5.417 ms

64 bytes from 10.11.12.13: icmp_seq=145 ttl=63 time=3.587 ms

64 bytes from 10.11.12.13: icmp_seq=146 ttl=63 time=3.744 ms

64 bytes from 10.11.12.13: icmp_seq=147 ttl=63 time=3.486 ms

64 bytes from 10.11.12.13: icmp_seq=148 ttl=63 time=3.466 ms

 

 

I think I have found a strange ARPing issue which is causing it.  In our corporate environment, we run GLBP (Gateway load balancing protocol) on Cisco gear.  As such, the gateway address floats between two devices requiring the mac_addr to change.  Looks something like this in the arp table:

 

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:2 on en0 ifscope [ethernet]

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:1 on en0 ifscope [ethernet]

 

On my mountain lion machine, it does a broadcast arp and gets a response for the new mac_addr immediately. 

 

25826.783206000Apple_78:29:ddBroadcastARP42Who has 10.224.165.1?  Tell 10.224.165.55
25926.786929000Cisco_e0:ff:40Apple_78:29:ddARP6010.224.165.1 is at 00:07:b4:02:cb:01

 

This happens seemlessly in the background and no packet loss is observed.  However, looks like Mavericks is doing something completely different, and WRONG.  It is sending out 5 UNICAST requests back to the mac address it had before (ARP should always be broadcast!!!).  It fails these 5 times and then finally does a BROADCAST attempt.  Looks like the below.  It causes then about a 5 second outage to the network of the machine.

 

394          67.052366000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

395          68.053450000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

396          69.053595000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

397          70.053893000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

398          71.054363000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

399          72.054466000          Apple_b9:a6:b2          Broadcast          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

400          72.058079000          Cisco_e0:ff:40          Apple_b9:a6:b2          ARP          60          10.224.165.1 is at 00:07:b4:02:cb:01

 

 

Here is the arp table during this period:

 

macsccmtest:~ administrator$ arp -a

? (10.224.165.1) at (incomplete) on en1 ifscope [ethernet]

? (10.224.165.220) at f0:b4:79:21:4c:ec on en1 ifscope [ethernet]

 

 

My hunch is that Apple did this to try to reduce bandwidth utilization on the network but it will cause BIG problems on corporate networks that use GLBP or any other protocol to provide redundancy across multiple devices!

 

Anyone else seeing this?  Everyone in my office who has moved to Mavericks can replicate this behavior.

OS X Mavericks (10.9)

Posted on Oct 25, 2013 11:12 AM

Close

Q: Mavericks and Failed ARP causing network drops!

  • All replies
  • Helpful answers

Previous Page 2 of 5 last Next
  • by jonschwenn,

    jonschwenn jonschwenn Dec 18, 2013 7:20 AM in response to Akira Okumura
    Level 1 (10 points)
    Dec 18, 2013 7:20 AM in response to Akira Okumura

    I can also confirm this has not been fixed in 10.9.1.  Just keep updating the bug reports and wait I suppose.

  • by Gup06,

    Gup06 Gup06 Dec 18, 2013 8:02 AM in response to ashvartsman
    Level 1 (0 points)
    Dec 18, 2013 8:02 AM in response to ashvartsman

    I just installed the script by jonschwenn that Lunaweb had scripted out.

     

    I will try this as my Air has been having major issues with our Cisco wireless network since upgrading to Mavericks. 

     

    Thanks much and I will report back if this works for me . 

  • by a brody,

    a brody a brody Dec 18, 2013 8:50 AM in response to ashvartsman
    Level 9 (66,781 points)
    Classic Mac OS
    Dec 18, 2013 8:50 AM in response to ashvartsman

    Let's keep an eye open for Airport updates to see if it is fixed, and if the command needs to revised after such updates.  I've referred this thread to my tip.

  • by Philip Ershler,

    Philip Ershler Philip Ershler Dec 19, 2013 10:45 AM in response to Gup06
    Level 1 (0 points)
    Dec 19, 2013 10:45 AM in response to Gup06

    Application of this script has just made matters worse for a Retina, 15-inch, Late 2013 MBP. Before I applied the script, the ping times to my gateway where all over the place, but I was not losing any pings. Now I'm seeing this.

     

    PING 155.100.140.1 (155.100.140.1): 56 data bytes

    64 bytes from 155.100.140.1: icmp_seq=0 ttl=64 time=11.729 ms

    64 bytes from 155.100.140.1: icmp_seq=1 ttl=64 time=25.814 ms

    Request timeout for icmp_seq 2

    Request timeout for icmp_seq 3

    Request timeout for icmp_seq 4

    Request timeout for icmp_seq 5

    Request timeout for icmp_seq 6

    Request timeout for icmp_seq 7

    Request timeout for icmp_seq 8

    Request timeout for icmp_seq 9

    Request timeout for icmp_seq 10

    64 bytes from 155.100.140.1: icmp_seq=11 ttl=64 time=4.447 ms

    64 bytes from 155.100.140.1: icmp_seq=12 ttl=64 time=1.292 ms

    64 bytes from 155.100.140.1: icmp_seq=13 ttl=64 time=3.395 ms

    ping: sendto: No route to host

    ping: sendto: No route to host

    Request timeout for icmp_seq 14

    ping: sendto: No route to host

    Request timeout for icmp_seq 15

    ping: sendto: No route to host

    Request timeout for icmp_seq 16

    Request timeout for icmp_seq 17

    64 bytes from 155.100.140.1: icmp_seq=18 ttl=64 time=1.184 ms

    64 bytes from 155.100.140.1: icmp_seq=19 ttl=64 time=1.288 ms

    64 bytes from 155.100.140.1: icmp_seq=20 ttl=64 time=104.871 ms

    64 bytes from 155.100.140.1: icmp_seq=21 ttl=64 time=4.029 ms

    64 bytes from 155.100.140.1: icmp_seq=22 ttl=64 time=4.928 ms

    64 bytes from 155.100.140.1: icmp_seq=23 ttl=64 time=3.167 ms

    64 bytes from 155.100.140.1: icmp_seq=24 ttl=64 time=1.449 ms

    64 bytes from 155.100.140.1: icmp_seq=25 ttl=64 time=13.503 ms

    64 bytes from 155.100.140.1: icmp_seq=26 ttl=64 time=9.271 ms

    64 bytes from 155.100.140.1: icmp_seq=27 ttl=64 time=194.988 ms

    Request timeout for icmp_seq 28

    Request timeout for icmp_seq 29

    Request timeout for icmp_seq 30

    Request timeout for icmp_seq 31

    Request timeout for icmp_seq 32

     

    My 17-Inch Mid 2010 MBP gets quite consistent ping times between 1 and 11 ms. Both machines are running 10.9.1 and connected to the same Airport Extreme.

     

    This is really a bummer.

  • by Gup06,

    Gup06 Gup06 Dec 21, 2013 7:43 AM in response to ashvartsman
    Level 1 (0 points)
    Dec 21, 2013 7:43 AM in response to ashvartsman

    I did apply the following statement to the /ect/sysctl.conf and has resolved the timeout issues I was having at work.

     

    "net.link.ether.inet.arp_unicast_lim=0"

     

    It also has improved the home wireless connectivity with my Air.  It doesn't disconnect like it used to.

  • by matdavis,

    matdavis matdavis Dec 21, 2013 8:05 AM in response to Gup06
    Level 1 (0 points)
    Dec 21, 2013 8:05 AM in response to Gup06

    Unfortunatly, I tried that and it didn't work. I replaced my router. I had a linksys E3000. And I replaced it with an Asus RT-N66U. Now I have full speeds and no packet loss on my mini.

  • by jonschwenn,

    jonschwenn jonschwenn Dec 21, 2013 9:47 AM in response to matdavis
    Level 1 (10 points)
    Dec 21, 2013 9:47 AM in response to matdavis

    I've seen a few mentions of consumer wifi scenarios mentioned in this thread (and the fix not working).  I've seem some corporate level mentions as well.

     

    Overall I believe there are multiple core issues with similar symptoms.  This thread is centered around the idea that when using GLBP/HSRP in a corporate level envrionment the OS is confused by the two routers and drops packets - that is just a generic overview. 

     

    A home wireless setup would not be the scame setup.  I haven't personally tested it, but the OS might also be confused by multiple WAP's or dual band WAP's?  It doesn't appear that setting this unicast ARP kernel parameter fixes those other issues.

  • by William Kucharski,

    William Kucharski William Kucharski Dec 23, 2013 4:37 AM in response to jonschwenn
    Level 6 (15,068 points)
    Mac OS X
    Dec 23, 2013 4:37 AM in response to jonschwenn

    Did anyone actually file a bug report on this?

  • by anfedoro,

    anfedoro anfedoro Dec 23, 2013 5:56 AM in response to William Kucharski
    Level 1 (0 points)
    Dec 23, 2013 5:56 AM in response to William Kucharski

    Gents.. seems that IOS7 has the same problemIMG_6542.png

    this gap does happen every minute..

    cna enyone confirm ?

    it is really affecting on all real time applications.. e.g Facetime or any other VoIP app.

  • by William Kucharski,

    William Kucharski William Kucharski Dec 24, 2013 10:24 AM in response to ashvartsman
    Level 6 (15,068 points)
    Mac OS X
    Dec 24, 2013 10:24 AM in response to ashvartsman

    This is a perfectly valid technique for performing ARP cache validation per RFC 1122, section 2.3.2.1:

     

    http://tools.ietf.org/html/rfc1122#page-22

     

    so I suspect it's more Cisco's feature not playing well with a known technique. :(

  • by jonschwenn,

    jonschwenn jonschwenn Dec 27, 2013 6:23 AM in response to William Kucharski
    Level 1 (10 points)
    Dec 27, 2013 6:23 AM in response to William Kucharski

    I wouldn't say Apple is free of any responsbility with this bug.

     

    Packet loss is known to be exhibited with both HSRP (RFC 2281) and GLBP (Cisco's prototcol).  There seems to be other scenarios as well.

     

    If Apple doesn't adapt their implimentation it'll be quite a while before anything filters on the Cisco side and network administrators of large companies that run HSRP/GLBP perform the upgrade.  Granted enough unhappy Mac users may push that along a little faster that expected.

  • by goft20,

    goft20 goft20 Jan 7, 2014 6:12 PM in response to jonschwenn
    Level 1 (0 points)
    Jan 7, 2014 6:12 PM in response to jonschwenn

    Hey, how do I stop the script from applying at every reboot?

    Thanks!

  • by jonschwenn,

    jonschwenn jonschwenn Jan 7, 2014 6:52 PM in response to goft20
    Level 1 (10 points)
    Jan 7, 2014 6:52 PM in response to goft20

    Use vi or nano to remove the following line from /etc/sysctl.conf

     

    net.link.ether.inet.arp_unicast_lim=0

     

    If that's the only line (sudo cat /etc/sysctl.conf) you can also just remove the file (sudo rm /etc/sysctl.conf). 

     

    Run commands at your own risk, please proceed with caution.  Commands like 'rm' can be dangerous if there are typos, etc.

  • by goft20,

    goft20 goft20 Jan 7, 2014 6:58 PM in response to jonschwenn
    Level 1 (0 points)
    Jan 7, 2014 6:58 PM in response to jonschwenn

    Thanks!

     

    Is it okay if I went ahead and deleted the file instead because that is the only line that the file contains?

     

    So just to clarify, this script will not run again on startup?

     

    Sorry if I seem dumb, but this is my first mac after 14 years on windows...I just don't want to screw up!

  • by jonschwenn,

    jonschwenn jonschwenn Jan 7, 2014 7:13 PM in response to goft20
    Level 1 (10 points)
    Jan 7, 2014 7:13 PM in response to goft20

    Totally fine to remove it, if that was the only line.  The file does not exist on a default install of Mavericks.

     

    The script does not run on start up.  It issues a kernel parameneter, looks for the sysctl.conf and adds the parameter there as well to make the setting persistent after reboot.

Previous Page 2 of 5 last Next