PMTU-D Black Hole Detection Missing? Cause of some conn hangs.

In looking through the kernel source, it appears that Apple has left out one of the most important parts of Path MTU Discovery (RFC1191) as suggested in RFC2923. Since Path MTU Discovery is enabled by default, this may cause some of you to have 15 minute hangs and ultimate termination of connections when large packets are sent to specific hosts.

Other than DNS and Wireless network drops, MTU settings appear to be one of the single most problematic things going on with OS X right now.

For those who are not familiar with MTU, here's a brief rundown.

10/100 Ethernet networks support a base MTU of around 1514 bytes. This is the max number of bytes that a packet is able to be in order to get put on an Ethernet network (and be within spec). Gigabit Ethernet allows for larger, but we won't go into that.

You're probably more used to hearing 1500, however, that is the MTU for IP (or the Ethernet payload) as the Ethernet header itself is 14 bytes.

In that 1500 bytes, you have to fit your IP header, ICMP/TCP/UDP header, and any higher layer protocols and data, each layering on top of the next.

|<--Ethernet (14 Bytes)--><--IP (20 Bytes)--><--TCP (20 Bytes)--><--Data (1460 Bytes)-->|

So that is how things look on a local area network.

Once the WAN comes into play, the Ethernet header is stripped off leaving only the IP packet and another header put in its place to get it over the next link. This process goes on and on and on until the packet finally reaches its destination.

But here's the problem - what if there is ANOTHER layer between the Ethernet and IP stack?

This is actually quite common and you're probably using it now. The protocol PPP over Ethernet (PPPoE) fits between the Ethernet header and the IP header and adds another 8 bytes to this packet size.

So now we end up with:

|<--Ethernet (14 Bytes)--><--PPPoE (8 Bytes)--><--IP (20 Bytes)--><--TCP (20 Bytes)--><--Data (1452 Bytes)-->|

Notice that we now can't put as much data in this packet or we'll end up with a packet that is too big to fit on the Ethernet network.

The PPPoE header will ultimately get taken off once the packet gets where it needs to go at your ISP, but there may be other 'tunnels' between you and your ultimate destination and continue to strip off space of how much data you can put into a packet.

So how do you know how much data you can put into a packet when you don't own or know anything about the network between you and the destination?

That's where Path MTU Discovery comes into play.

It used to be that IP packets would be fragmented (split up) if a packet was too big to get put on the next network. This process of fragmentation causes overhead for both the router having to split up the packets and the receiving device that has to put them all back together again (and make sure they go in the right order).

So in order to reduce this overhead and also ensure that you are always sending the largest packets possible from end to end, IP stacks started setting the 'Don't Fragment' bit in the IP header. This instructs routers to throw away the packet if it is too big when it gets there.

When the packet gets thrown away due to it being too large, the router that throws it away also sends an ICMP packet (an IP diagnostics message) back to the sender telling it what the MTU is of the interface that couldn't take the packet. The sender can then re-calculate things based on that value and resend.

This works great EXCEPT when there are firewalls in the way (or broken routers, which is less likely these days). Many firewalls will not allow these ICMP messages to go back to the sender. Therefore, your host never receives the message that it is supposed to reduce the size of the packet and keeps trying and trying for about 15 minutes until it finally dies.

This is one reason why you may be seeing long hangs that ultimately end in termination of your connection.

RFC2923 goes into some options to work past this issue.

One way that this can be done, Windows Vista for instance does this, is for the system to keep an eye on how many max sized packets get retransmitted. After a certain number (lets say 5), the system assumes that it is not getting this ICMP notification and cuts the size of that packet in half so it can now get the data through - assuming that smaller packets are better than no packets getting sent.

It may also (and does with Vista) temporarily disable the setting of the 'Don't Fragment' bit and allow the routers to just take care of things. So in Vista, you'll see the page stutter for a second, and then continue to load, where an OS X system will sit there and hang for 15 minutes.

This is where OS X goes wrong. This behavior is called PMTU-D Black Hole Detection and does not appear to be in the IP stack for Leopard (and probably not previous releases).

So what can you do.

You have a few options, some of which I've already provided to a few folks (although without the mathematics so it's just a rough guess value).

First, you can just disable PMTU-D. The command to do this is:

sudo sysctl -w net.inet.tcp.path mtudiscovery=0

This is a 'quick fix' but does eliminate the benefits that PMTU-D provides.

Second, you can calculate out what size MTU seems to work for you by working backwards and configure that on something within your control.

If your home router supports it, that's a good place to reduce the MTU since it only comes into play when you're using your Internet connection and not when hosts within the same network talk to each other. So if you place the MTU of 1472 on your router and your host sends it a 1500 byte packet, it will send back the ICMP message telling you to reduce it down to 1472.

If your router doesn't support it, you can reduce the MTU on your macs physical interface. This isn't always the best solution since you really should then reduce the MTU on each of your local systems or you could run into issues locally.

The command to do this is:

sudo ifconfig en1 mtu 1472

To make this permanent for Ethernet, set it in the Network settings. For Airport, search the forums. I provided an update you can make to one of the preferences files manually to do this (don't remember what file right now).

I have found a couple of sites (Washington Mutual's website for instance) that appear to have configuration issues internal to their network in which a device behind a firewall (possibly the web server, load balancer, or IPSec added) that may have an MTU less than 1500 set on it AND a firewall blocking ICMP packets from coming back. These sites will throw off your math since you can no longer assume a max size of 1500 for IP packets. In this specific case, you have to assume 1480.

Third, you can adjust the MSS setting (Max Segment Size) in the kernel to a value that is 20 bytes smaller than what you would otherwise set the MTU to. This ensures that the TCP stack doesn't put more than that amount of data in any single packet (therefore, eliminating the MTU issue), however, this will not work for UDP.

Finally, you can submit a bug report to let Apple know that PMTU-D Black Hole detection is something that we need.

So what kind of impact does this have on performance?

This will depend on what solution you choose, what the performance of your home router is, and the load on the various servers that have to potentially re-assemble the packets.

That said, knocking things all the way down to 1400 bytes, I am able to still get at least 15 Mbps up and down stream over the Internet.

If you have any questions on this post, please post and i'll do my best to respond. Hopefully this will help one more person resolve their performance issues with Leopard.

Mac OS X (10.5.1)

Posted on Dec 29, 2007 5:18 PM

Reply

There are no replies.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

PMTU-D Black Hole Detection Missing? Cause of some conn hangs.

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.