SSHD Processes Using 100% CPU

May 19, 2008 8:08 AM in response to L. Mariachi

I'm seeing this problem on our Mac mini "server" machine as well, and we've had IPv6 off from the beginning. I'm hoping they get this fixed in 10.5.3 because it's a huge pain.

Reply

May 19, 2008 8:11 AM in response to L. Mariachi

IPv6 has been off on my 2 xServes as well (for sometime), so this isn't the solution on my end either.

Reply

May 21, 2008 4:41 AM in response to Worsham

I'm having this same problem. This is a very serious bug and it eats away at the available CPU. 100% per hung sshd. My machine has 8 processors so losing one or two isn't a big deal but if I'm not paying very close attention to login/logout/timeout then it would eat the whole 8 CPUs. Having paid as much for the machine as I did, I'd expect sshd to work.

The specs on the problem are that when the system is rebooted then login/logout/timeout work correctly and after a few days login and timeout stop working and produce the 100% CPU hung sshd process. With the login it says "Password:" so the password goes in but then it hangs. a second login attempt usually works (95% of the time) but then checking top -u there is a hung sshd process from the first attempt. For timeout, if my network connection drops (like going from work to home on a laptop) then after a few days (of the server being up) when doing this the sshd on the server doesn't quit like it should but hangs at 100% CPU.

This only happens when I login not on the zillion attempts by bots to log in.

All I can say is thank god there are 8 CPUs in the machine b/c this could be a nasty bug if one is on one CPU.

Reply

mcdermj

Level 1

0 points

May 26, 2008 3:51 AM in response to Chris Adams3

It seems like it's actually in the pty allocation code. Errno 17 is "File Exists", so from looking in some of the Darwin kernel code, it appears as though when the pty allocation process is getting the next one, it actually is getting one that already exists in devfs. I didn't dig too far into the issue because, well, I'm not really a kernel developer or anything, and I really don't want to try debug kernels on my production machine.

I don't know whether it's SSH that's looping, or the syscall, but when it can't allocate a pty, its looping over it. It doesn't make much sense, but it's pretty annoying. It basically has made SSH useless for me.

Reply

May 28, 2008 6:32 PM in response to Worsham

Does 10.5.3 fix this issue? Please.....

Reply

May 30, 2008 2:29 AM in response to MartyGearbox

Unfortunately, it doesn't fix it. :-(((

At least our Mac OS X Server (updated to 10.5.3 yesterday) forked a hanging SSHD process (consuming 100% of CPU time) just a few hours ago.

Best regards,
Steffen

Reply

May 30, 2008 2:44 PM in response to Steffen M.

I installed the 10.5.3 update yesterday morning and had my first crashed sshd just a few minutes ago. Does anyone know how to get this problem escalated on Apple's bug fixing priority list?

Due to the use of Open Directory, I was forced to setup this server as an "upgrade" from tiger rather than as a fresh Leopard install. Is there anyone out there who is having this sshd crashing problem on a leopard server installed not as a tiger upgrade? (Open Directory exports from Tiger server would not import into Leopard server - even after I imported the directory in tiger, upgraded to leopard, then exported the leopard OD, I was unable to import that resulting OD export into a clean leopard install.)

Reply

bluehen

Level 1

0 points

May 31, 2008 5:49 AM in response to jeo-at-mac.com

I'm still having this problem with a clean install of 10.5 on a new Xserve purchased in Dec 07. I did not upgrade from Tiger.

After 10.5.3 came out, I wrote an e-mail to devbugs@apple.com asking if bug id 5685756 was fixed in 10.5.3. Got a response back a day or so later indicating it had not, and their engineers were aware of the issue and were working on it. /sigh

I don't know what we as a community can do to escalate this issue. I suggest everyone that is having this issue file a bug report, and everyone that has filed a report already, periodically inquire as to the status using the devbugs e-mail. It's sad that a bug of this magnitude has gone unfixed this long.

Reply

May 31, 2008 9:01 AM in response to bluehen

I don't know any better way to escalate this, either. I've already filed a bug report and asked for the state of the bug. I got the same reply like you. 😟

As we use our XServe running Mac OS X Server 10.5.3 also to host a large database which is used for longer-term scientific computing (one calculation is running for about two weeks), it is extremely annoying. We cannot reboot our server nightly, because this would interrupt the computations. Killing the hanging "sshd" processes is not a good idea, either: The "pty" devices which were in use by hanging processes behave a bit like "zombies". When newly initiated "ssh" connections try to get one of these "zombie devices", it leads directly to new hanging "sshd" processes.

Example: Assume that "/dev/ttys001", "/dev/ttys002" and "/dev/ttys003" are occupied by hanging "sshd "processes. To free CPU resources, I kill these three hanging "sshd" processes. After then, for example, the next four newly initiated "ssh" connections are assigned to "/dev/ttys001", "/dev/ttys002", "/dev/ttys003" and "/dev/ttys004". The first three newly forked "sshd" processes will hang also and hog the CPU(s) again, the fourth one will be -most probably- usable.

We are really thinking about investigating how well Linux or FreeBSD will run on our XServe (I have no idea if there are drivers for Apple's FiberChannel controller to connect the XServe RAID)...

It's just sad that Apple needs so long to fix this bug.

Best regards,
Steffen

Reply

May 31, 2008 10:55 AM in response to Steffen M.

About the SSHD problem, our server is finally working flawlessly for more than 12 days... WOW !!! Before that, we always encountered sshd problems after 2-5 days.

This is what I did to try to minimize the problem :
• I did a plain 10.5.2 update without all the updates published later (Java, Leopard graphics...)
• Right after a restart, I opened 2 Terminal windows (because the sshd problem may also prevent you from opening a Terminal window and log in)
• I activated the Firewall to block ssh connections which are not absolutely necessary (most notably to avoid all the "dictionary attacks")
• As advised on other threads, I never let the Admin Server application running
• We run only minimal services for now: mail (pop3s, imaps), webserver (vanilla HTML on ports 80 and 443 and Webmail only on 443), Open Directory, AFP (with 9 machines using a RAID 1 volume for Time Machine backups), Kerberos, Firewall.
• The mail server is stopped and restarted every night for backup with mailbfr
• I connect to the server from server applications or through VNC (no ARD).

I do not think all that cure the sshd problem but at least we do not need to restart the XServe every 2-5 days.

Reply

May 31, 2008 12:24 PM in response to Stéphane Mons

Thank you very much for your hints.

Unfortunately, we cannot block SSH as our students and we the scientific staff regularly use this service to access user data from home or other places. At the moment, we have to live with the problem, "re-nice" hanging processes and reboot the server between two long-term computations.

Bye,
Steffen

Reply

Jun 1, 2008 9:49 PM in response to Steffen M.

Steffan, Thanks for the detail on the zombie issue. Now I understand why killing the sshd processes and restarting the ssh service didn't help.

So far, I have submitted a bug report and I have repeatedly pestered two Apple education rep teams (both the sales & system engineer) and am getting traction nowhere.

I noticed that this thread started in mid march and that Stéphane said he was able minimize the problem by going back to the 10.5.2 update without any updates post the 10.5.2. Did anyone have this problem with 10.5.1? Given the security updates between 10.5.1 and 10.5.3, going back doesn't seem like a viable option, but I was still curious about whether the problem was introduced with leopard or with a later leopard update.

Reply

Jun 3, 2008 1:51 AM in response to jeo-at-mac.com

Hi Jeo,

as far as I can see this thread has been already started in December of 2007, so it seems that the problem has been already existing in 10.5.1...

Steffen

Reply

rprr

Level 1

0 points

Jun 4, 2008 2:24 PM in response to Steffen M.

Just to add that this happens on my Macbook as well running 10.5.3. Terminal
program increases to 99%CPU and hangs with no terminal appearing. My Desktop (G5)
also running 10.5.3 appears to not have this issue.

Reply

rprr

Level 1

0 points

Jun 4, 2008 2:51 PM in response to rprr

Miraculously, Terminal appears to be working now. I don't what made it work but these are the actions I took.

1. Started Terminal. It hung up. Did Force Quit and Sent Crash report to Apple.

2. Since I had a working X11 application, I used it to open about 10 xterms using the Applications->Terminal Menu of X11.app

3. I started Terminal again. Now it works.

I Logged out and restarted. It seems to be behaving normally.

Reply