ashvartsman

Q: Mavericks and Failed ARP causing network drops!

I have been wracking my brain about why on our corporate network, after Mavericks upgrade, we start to see dropped packets every 30-60 seconds.  Here is an example of that ping.

 

64 bytes from 10.11.12.13: icmp_seq=135 ttl=63 time=3.705 ms

64 bytes from 10.11.12.13: icmp_seq=136 ttl=63 time=3.473 ms

64 bytes from 10.11.12.13: icmp_seq=137 ttl=63 time=3.811 ms

64 bytes from 10.11.12.13: icmp_seq=138 ttl=63 time=4.110 ms

Request timeout for icmp_seq 139

Request timeout for icmp_seq 140

Request timeout for icmp_seq 141

Request timeout for icmp_seq 142

Request timeout for icmp_seq 143

64 bytes from 10.11.12.13: icmp_seq=144 ttl=63 time=5.417 ms

64 bytes from 10.11.12.13: icmp_seq=145 ttl=63 time=3.587 ms

64 bytes from 10.11.12.13: icmp_seq=146 ttl=63 time=3.744 ms

64 bytes from 10.11.12.13: icmp_seq=147 ttl=63 time=3.486 ms

64 bytes from 10.11.12.13: icmp_seq=148 ttl=63 time=3.466 ms

 

 

I think I have found a strange ARPing issue which is causing it.  In our corporate environment, we run GLBP (Gateway load balancing protocol) on Cisco gear.  As such, the gateway address floats between two devices requiring the mac_addr to change.  Looks something like this in the arp table:

 

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:2 on en0 ifscope [ethernet]

efl-ashvartsman:~ ashvartsman$ arp -a

? (10.224.165.1) at 0:7:b4:2:cb:1 on en0 ifscope [ethernet]

 

On my mountain lion machine, it does a broadcast arp and gets a response for the new mac_addr immediately. 

 

25826.783206000Apple_78:29:ddBroadcastARP42Who has 10.224.165.1?  Tell 10.224.165.55
25926.786929000Cisco_e0:ff:40Apple_78:29:ddARP6010.224.165.1 is at 00:07:b4:02:cb:01

 

This happens seemlessly in the background and no packet loss is observed.  However, looks like Mavericks is doing something completely different, and WRONG.  It is sending out 5 UNICAST requests back to the mac address it had before (ARP should always be broadcast!!!).  It fails these 5 times and then finally does a BROADCAST attempt.  Looks like the below.  It causes then about a 5 second outage to the network of the machine.

 

394          67.052366000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

395          68.053450000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

396          69.053595000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

397          70.053893000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

398          71.054363000          Apple_b9:a6:b2          Cisco_02:cb:02          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

399          72.054466000          Apple_b9:a6:b2          Broadcast          ARP          42          Who has 10.224.165.1?  Tell 10.224.165.225

400          72.058079000          Cisco_e0:ff:40          Apple_b9:a6:b2          ARP          60          10.224.165.1 is at 00:07:b4:02:cb:01

 

 

Here is the arp table during this period:

 

macsccmtest:~ administrator$ arp -a

? (10.224.165.1) at (incomplete) on en1 ifscope [ethernet]

? (10.224.165.220) at f0:b4:79:21:4c:ec on en1 ifscope [ethernet]

 

 

My hunch is that Apple did this to try to reduce bandwidth utilization on the network but it will cause BIG problems on corporate networks that use GLBP or any other protocol to provide redundancy across multiple devices!

 

Anyone else seeing this?  Everyone in my office who has moved to Mavericks can replicate this behavior.

OS X Mavericks (10.9)

Posted on Oct 25, 2013 11:12 AM

Close

Q: Mavericks and Failed ARP causing network drops!

  • All replies
  • Helpful answers

first Previous Page 3 of 5 last Next
  • by goft20,

    goft20 goft20 Jan 7, 2014 7:16 PM in response to jonschwenn
    Level 1 (0 points)
    Jan 7, 2014 7:16 PM in response to jonschwenn

    So out of curiousity, even if the file is not there, will the script still look for the file?

  • by jonschwenn,

    jonschwenn jonschwenn Jan 7, 2014 7:23 PM in response to goft20
    Level 1 (10 points)
    Jan 7, 2014 7:23 PM in response to goft20

    Here is the code:

    https://raw.github.com/MacMiniVault/Mac-Scripts/master/unicastarp/unicastarp.sh

     

    The first if statement makes surre that the OS is OS X 10.9.x .... not linux or some older version of OS X

     

    If that is true, then the script looks to see if the /etc/sysctl.conf file exists (second if statement).

     

    If that is true, we just look for the word 'unicast' in the file.  If we have a positive hit (third if statement) then we stop the script and assume the parameter has already been set and exists.

     

    With that out of the way, the script sets the parameter in the live envrionment.  It also uses the 'tee -a' command to take the echo'd line and put it in the config file,  appending it if the file exists.  We also set the ownership and permissions properly to the config file. 

  • by goft20,

    goft20 goft20 Jan 7, 2014 7:46 PM in response to jonschwenn
    Level 1 (0 points)
    Jan 7, 2014 7:46 PM in response to jonschwenn

    So the script will always be active in the background? If it does not find the file, then will it keep searching for it later? In other words, will it affect system performance in any way? Thanks!

  • by jonschwenn,

    jonschwenn jonschwenn Jan 7, 2014 7:51 PM in response to goft20
    Level 1 (10 points)
    Jan 7, 2014 7:51 PM in response to goft20

    I'm sorry for not explaining that part.  It runs once and that's it.  The script stops executing once it says "PATCH ENABLED" or "PATCH WAS PREVIOUSLY ENABLED".

  • by goft20,

    goft20 goft20 Jan 7, 2014 7:56 PM in response to jonschwenn
    Level 1 (0 points)
    Jan 7, 2014 7:56 PM in response to jonschwenn

    Is there no way to stop the script from ever running again..in essence, disabling it? Thanks for responding so fast and being helpful!

  • by jonschwenn,

    jonschwenn jonschwenn Jan 7, 2014 8:33 PM in response to goft20
    Level 1 (10 points)
    Jan 7, 2014 8:33 PM in response to goft20

    The script runs once and that is it.  It sets a variable in /etc/sysctl.conf to keep the 'fix' enabled on boot.  Removing the variable from the /etc/sysctl.conf file, or removing the file itself, and rebooting reverts the system back to the default behavior.

  • by William Kucharski,

    William Kucharski William Kucharski Jan 7, 2014 9:28 PM in response to a brody
    Level 6 (15,118 points)
    Mac OS X
    Jan 7, 2014 9:28 PM in response to a brody

    Once again, not a bug:

     

    This is a perfectly valid technique for performing ARP cache validation per RFC 1122, section 2.3.2.1:

     

    http://tools.ietf.org/html/rfc1122#page-22

     

    Unicast Poll -- Actively poll the remote host by
    periodically sending a point-to-point ARP Request
    to it, and delete the entry if no ARP Reply is
    received from N successive polls.
  • by XenoPhage,

    XenoPhage XenoPhage Feb 11, 2014 1:15 PM in response to ashvartsman
    Level 1 (0 points)
    Feb 11, 2014 1:15 PM in response to ashvartsman

    This is a bug, but it doesn't appear to be a bug with Mavericks.  I believe this is a Cisco bug.

     

    We're having the same problem here with GLBP and Mavericks.  The fix provided in this thread solved the problem for us, but I dug a little deeper to determine if this was only limited to Mavericks.  From my testing, it does not appear to be.

     

    Linux provides a neat utility called arping that essentially pings a remote host by using unicast arp requests.  If you use that tool and ping the lova GLBP address on the router (Not the shared address you use as a default gateway) you'll get consistent ping replies, as you would expect.  If you ping the shared address, you get a few (1-3 in my testing) requests answered, and then the remote stops responding.  What's interesting is that if you do this a few times, you'll see that the pings stop as soon as the MAC address the router replies with changes.  Here's an example :

     

    PROD> $ arping -I em1 192.168.0.1
    ARPING 192.168.0.1 from 192.168.0.123 em1
    Unicast reply from 192.168.0.1 [00:07:B4:00:01:01]  3.581ms
    Unicast reply from 192.168.0.1 [00:07:B4:00:01:02]  3.014ms

     

    The ping stopped as soon as the MAC address changed.

     

    I have an open case with Cisco where I'm referencing this problem as well as this thread.  Hopefully more information will follow.  If I receive any viable information, I'll pass it on.

  • by XenoPhage,

    XenoPhage XenoPhage Feb 12, 2014 1:19 PM in response to ashvartsman
    Level 1 (0 points)
    Feb 12, 2014 1:19 PM in response to ashvartsman

    So I spoke with Cisco and they're pointing a finger at Apple.  Per Cisco, Apple has implemented this improperly.  They claim that the MAC Address is being removed from the ARP table prior to the Unicast ARP requests.  Can anyone confirm this is the case?

     

    They also pointed out this page which also explains the issue and has a note that 10.9.2 is rumored to resolve this.  http://www.macstadium.com/blog/osx-10-9-mavericks-bugs/

  • by Akira Okumura,

    Akira Okumura Akira Okumura Feb 12, 2014 1:26 PM in response to XenoPhage
    Level 1 (0 points)
    Feb 12, 2014 1:26 PM in response to XenoPhage

    I have the same issue with a non-Cisco product after upgrading to Mavericks.

  • by Philip Ershler,

    Philip Ershler Philip Ershler Feb 12, 2014 7:16 PM in response to XenoPhage
    Level 1 (0 points)
    Feb 12, 2014 7:16 PM in response to XenoPhage

    I have a friend who has a new MacBook Pro. When it was running 10.9.0, everything was fine. But when we updated it to 10.9.1, the wireless would not hold a connection for more than a minute or so even though he was sitting right next to his time capsule. He spent almost two hours with a very responsive guy at the local Genius Bar. The Genius finally decided to roll the machine back to 10.9.0 and wait for the next release of 10 to see if the problem will be fixed. Once the machine was installed again with 10.9.1, everything is fine. I don't know if this has anything to do with your issue or not. BTW, I run 10.9.1 on my older MBP with no issues. Phil

  • by rossoneri91,

    rossoneri91 rossoneri91 Feb 28, 2014 6:44 AM in response to jonschwenn
    Level 1 (0 points)
    Feb 28, 2014 6:44 AM in response to jonschwenn

    I recently changed my router and now it connects and stays connected but I have no internet access intermittently or until I turn wifi off and back on. This new router is a Samsung SMT-G7400/XEN and I think it's dualband.

     

    How would I be able to fix this issue? I'm not even sure it's what other people here are having because most of what everyone is talking about goes over my head. I'm on a late 2013 13" MBPr on 10.9.2.

  • by nhira,

    nhira nhira Mar 16, 2014 4:34 PM in response to ashvartsman
    Level 1 (0 points)
    Mar 16, 2014 4:34 PM in response to ashvartsman

    This problem still exists with 10.9.2. None of the other suggested solutions in this thread fix the problem for me personally.

    I just spent about 20mins on the phone with apple support and they couldn't figure it out either.

     

    I'm connected directly to the cable modem ( no wifi or router ).

  • by Peter-Erik,

    Peter-Erik Peter-Erik Mar 18, 2014 6:38 AM in response to nhira
    Level 1 (10 points)
    Mar 18, 2014 6:38 AM in response to nhira

    Running into the same problem only if i switch from Ethernet to USB Ethernet the problem is gone (MacMini)

    added a bug to bugreport.apple

  • by Peter-Erik,

    Peter-Erik Peter-Erik Mar 19, 2014 2:42 AM in response to Peter-Erik
    Level 1 (10 points)
    Mar 19, 2014 2:42 AM in response to Peter-Erik

    I found on a forum some solution to duplicate the Ethernet port and use the duplicated port as main ethernet port

    but dont delete the old port (and dont forget to set the service error) i have no time to check now but friday i change it on my mac mini server i will post the result here.

first Previous Page 3 of 5 last Next