Awful Mac-specific TCP "Upstream" Performance Problems - TCP Stack Issues?
I've been analyzing some really odd performance problems. Let me lay out the framework here to make this as clear as possible. Three machines will be involved here to explain the scenario:
Linux-A <-> WRT54G/DD-WRT <-WDS Span-> WRT54G/DD-WRT <-> [ Linux-B & Mac-A ]
That's Linux-A as a server on the far end of the wireless link, a pair of WRT54G's with DD-WRT (23beta1 on one, 11/04/05 Beta2 on the other) joined by a WDS link, and Linux-B and Mac-A as two workstations on the same switch on the local end. Linux-B and Mac-A essentially have an equal path over the WDS link to Linux-A on the far end.
I'll list the approximate transfer speeds of a given large test file via FTP between the various machines:
Linux-A -> Linux-B: 1.4 Megabytes per second
Linux-B -> Linux-A: 1.4 Megabytes per second
Linux-A -> Mac-A: 1.4 Megabytes per second
Mac-A -> Linux-A: 40-60 Kbytes per second
Mac-A -> Linux-B: 10 Megabytes per second
Linux-B -> Mac-A: 10 Megabytes per second
So, in a nutshell, the two local workstations sharing a switch with 100 mbit, full-duplex between them hit about 10 megabytes per second... about what you would expect. Anything in either direction between the two Linux boxes throttles the wireless link out to 1.4 megabytes per second. I'm quite happy with that, all in all.
Transfers from Linux-A to Mac-A across the wireless link are equal to the two Linux boxes talking to each other: a healthy 1.4 megs per second.
The problem, though, is any transfer FROM Mac-A to Linux-A across the wireless link starts fast for a half second, then plummets into the 40 - 60 K bytes per second range. It's absolutely horrible in terms of performance.
Here are a few oddities I'll throw in for possible clues:
- The Mac defaults to 100mbit/full-duplex - same as the other machines. If, however, I force the Mac to 10mbit/full-duplex (which isn't even IEEE legit, I don't think), I can hit speeds of about 1.05 megabytes per second on transfers from the Mac. This is putting me close to the performance of the other machines. Any other combination, though, of speed and duplex is as bad or worse than the 100mbit/full that it should be using.
- The Airport wireless interface in my Mac behaves rather well. In other words, only the ethernet link exhibits these terrible upstream speed issues. Switching to the airport and going through a wireless access-point nearby gets my speeds up into the hundreds of K/sec and close to the ethernet achieved speeds by the other machines. In other words, this seems to be VERY ethernet specific.
- For grins, I've FTP'd from Linux-A into Mac-A and pulled the file rather than pushing it from the Mac like normal. No difference in performance (not that there should be... but it pays to test everything).
I've made a few misc adjustments to parameters in the TCP/IP stack under OS X (turning on and off dynamic window sizing, for instance) to see if I could cause a performance change. For the most part, things remain the same or get worse. I've not found any parameters that seem to have any real impact. My first foray into adjusting the TCP/IP kernel settings was the Broadband adjustment tool Apple made available recently. It made the problem worse.
I've also done some very detailed comparative ethereal packet sniffs of the transfers at various speeds and between various machines and the Mac. Most of the transfers look roughly the same on the surface. The FTP Data packets are 1448 bytes in all cases and the window sizes seem to follow the same trends. What I can say is that the 100-mbit connection from the Mac is far less "smooth" with longer delays between FTP-Data packets, far more DUP-ACKs coming in waves, etc.
It's roughly like the Mac is blowing the bandwidth of the link initially, throttling back (as per normal for TCP to get its bearings) but never recovering properly. It's left with a hideous outgoing transfer rate and a seemingly higher retransmit and DUP-ACK rate than, say, 10mbit/full-duplex. Again, on the surface, the packet dumps between Linux-A and the Mac and Linux-B and the Mac seem very similar except that the former ***** and the latter runs like a top.
Lastly, this problem appears to be OS X/Mac specific in general. I have two different generation 17" Powerbooks on hand and both of them behave exactly the same way in terms of these tests. I also have an older Cube running OS X Panther Server and it behaves precisely the same way.
For illustration sake, I also generated two Time/Sequence graphs of the FTP-DATA packets being transmitted. They are just from part of the transfer, but there is enough there to properly illustrate the timing behavior. A good 10mbit/full-duplex transfer is here:
http://sparhawk.sbc.edu/dd-wrt/10mbit.jpg
And the choppy, 100mbit transfer is here:
http://sparhawk.sbc.edu/dd-wrt/100mbit.jpg
Again, keep in mind that incoming data to Mac-A seems fine and runs on par with the other boxes. Only UPLOADS from Mac-A across the wireless link are dismal.
As I've said, performance within the LAN here is perfectly good and that has been my experience elsewhere, too... but crossing the WDS link seems to trigger some bit of insanity.
I've just put two Ethereal packet captures online.
http://sparhawk.sbc.edu/dd-wrt/mac.cap
This is the Mac uploading files at an abysmal rate. The FTP-DATA packets start at packet #99. Packet #94 is the SYN requesting a 2x multiplier on the window size. The ACK responds with a 0 window scale.
http://sparhawk.sbc.edu/dd-wrt/linux.cap
This is the Linux box doing the same file transfer at a perfectly good rate. The FTP-DATA packets here start at packet #112. Packet #107 is the SYN requesting a 4x multiplier on the window size. The ACK responds with a 0 window scale.
The test file is 212 megs in size, so I obviously cut off the transfers very quickly so the dumps wouldn't be so big. They are both fully established in their behavior (slow or fast) by the time I stopped each capture, so there is plenty of sample data here.
One other thing to point out. The Mac originating packets ALWAYS claim invalid checksum... though there is no evidence this is true and nothing retransmits as a result. I'm thinking ethereal and the hardware checksumming on the Mac don't agree with each other in some odd way. My transfers are slow, but they are never corrupted.
If I can answer any questions for anyone pondering this more deeply, please let me know. I can also tweak numerous settings within the OS/X TCP stack, so if somebody has suspicions, let me know what parameters to experiment with.
Thanks!
- Aaron
Powerbook 17" - 1.66 GHz - 2GB RAM Mac OS X (10.4.3)
Linux-A <-> WRT54G/DD-WRT <-WDS Span-> WRT54G/DD-WRT <-> [ Linux-B & Mac-A ]
That's Linux-A as a server on the far end of the wireless link, a pair of WRT54G's with DD-WRT (23beta1 on one, 11/04/05 Beta2 on the other) joined by a WDS link, and Linux-B and Mac-A as two workstations on the same switch on the local end. Linux-B and Mac-A essentially have an equal path over the WDS link to Linux-A on the far end.
I'll list the approximate transfer speeds of a given large test file via FTP between the various machines:
Linux-A -> Linux-B: 1.4 Megabytes per second
Linux-B -> Linux-A: 1.4 Megabytes per second
Linux-A -> Mac-A: 1.4 Megabytes per second
Mac-A -> Linux-A: 40-60 Kbytes per second
Mac-A -> Linux-B: 10 Megabytes per second
Linux-B -> Mac-A: 10 Megabytes per second
So, in a nutshell, the two local workstations sharing a switch with 100 mbit, full-duplex between them hit about 10 megabytes per second... about what you would expect. Anything in either direction between the two Linux boxes throttles the wireless link out to 1.4 megabytes per second. I'm quite happy with that, all in all.
Transfers from Linux-A to Mac-A across the wireless link are equal to the two Linux boxes talking to each other: a healthy 1.4 megs per second.
The problem, though, is any transfer FROM Mac-A to Linux-A across the wireless link starts fast for a half second, then plummets into the 40 - 60 K bytes per second range. It's absolutely horrible in terms of performance.
Here are a few oddities I'll throw in for possible clues:
- The Mac defaults to 100mbit/full-duplex - same as the other machines. If, however, I force the Mac to 10mbit/full-duplex (which isn't even IEEE legit, I don't think), I can hit speeds of about 1.05 megabytes per second on transfers from the Mac. This is putting me close to the performance of the other machines. Any other combination, though, of speed and duplex is as bad or worse than the 100mbit/full that it should be using.
- The Airport wireless interface in my Mac behaves rather well. In other words, only the ethernet link exhibits these terrible upstream speed issues. Switching to the airport and going through a wireless access-point nearby gets my speeds up into the hundreds of K/sec and close to the ethernet achieved speeds by the other machines. In other words, this seems to be VERY ethernet specific.
- For grins, I've FTP'd from Linux-A into Mac-A and pulled the file rather than pushing it from the Mac like normal. No difference in performance (not that there should be... but it pays to test everything).
I've made a few misc adjustments to parameters in the TCP/IP stack under OS X (turning on and off dynamic window sizing, for instance) to see if I could cause a performance change. For the most part, things remain the same or get worse. I've not found any parameters that seem to have any real impact. My first foray into adjusting the TCP/IP kernel settings was the Broadband adjustment tool Apple made available recently. It made the problem worse.
I've also done some very detailed comparative ethereal packet sniffs of the transfers at various speeds and between various machines and the Mac. Most of the transfers look roughly the same on the surface. The FTP Data packets are 1448 bytes in all cases and the window sizes seem to follow the same trends. What I can say is that the 100-mbit connection from the Mac is far less "smooth" with longer delays between FTP-Data packets, far more DUP-ACKs coming in waves, etc.
It's roughly like the Mac is blowing the bandwidth of the link initially, throttling back (as per normal for TCP to get its bearings) but never recovering properly. It's left with a hideous outgoing transfer rate and a seemingly higher retransmit and DUP-ACK rate than, say, 10mbit/full-duplex. Again, on the surface, the packet dumps between Linux-A and the Mac and Linux-B and the Mac seem very similar except that the former ***** and the latter runs like a top.
Lastly, this problem appears to be OS X/Mac specific in general. I have two different generation 17" Powerbooks on hand and both of them behave exactly the same way in terms of these tests. I also have an older Cube running OS X Panther Server and it behaves precisely the same way.
For illustration sake, I also generated two Time/Sequence graphs of the FTP-DATA packets being transmitted. They are just from part of the transfer, but there is enough there to properly illustrate the timing behavior. A good 10mbit/full-duplex transfer is here:
http://sparhawk.sbc.edu/dd-wrt/10mbit.jpg
And the choppy, 100mbit transfer is here:
http://sparhawk.sbc.edu/dd-wrt/100mbit.jpg
Again, keep in mind that incoming data to Mac-A seems fine and runs on par with the other boxes. Only UPLOADS from Mac-A across the wireless link are dismal.
As I've said, performance within the LAN here is perfectly good and that has been my experience elsewhere, too... but crossing the WDS link seems to trigger some bit of insanity.
I've just put two Ethereal packet captures online.
http://sparhawk.sbc.edu/dd-wrt/mac.cap
This is the Mac uploading files at an abysmal rate. The FTP-DATA packets start at packet #99. Packet #94 is the SYN requesting a 2x multiplier on the window size. The ACK responds with a 0 window scale.
http://sparhawk.sbc.edu/dd-wrt/linux.cap
This is the Linux box doing the same file transfer at a perfectly good rate. The FTP-DATA packets here start at packet #112. Packet #107 is the SYN requesting a 4x multiplier on the window size. The ACK responds with a 0 window scale.
The test file is 212 megs in size, so I obviously cut off the transfers very quickly so the dumps wouldn't be so big. They are both fully established in their behavior (slow or fast) by the time I stopped each capture, so there is plenty of sample data here.
One other thing to point out. The Mac originating packets ALWAYS claim invalid checksum... though there is no evidence this is true and nothing retransmits as a result. I'm thinking ethereal and the hardware checksumming on the Mac don't agree with each other in some odd way. My transfers are slow, but they are never corrupted.
If I can answer any questions for anyone pondering this more deeply, please let me know. I can also tweak numerous settings within the OS/X TCP stack, so if somebody has suspicions, let me know what parameters to experiment with.
Thanks!
- Aaron
Powerbook 17" - 1.66 GHz - 2GB RAM Mac OS X (10.4.3)
Powerbook G4 17" - 1.67 GHz - 2 GB RAM, Mac OS X (10.4.3)