Apple Event: May 7th at 7 am PT

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

10.6.3 frequent crashes

I've got an xServe 2.66Ghz w/ SSD and 12GB ram running 10.6.3 server. This is a new server and was running 10.6.2 for 3-4 weeks before I upgraded to 10.6.3. I made the mistake of not testing 10.6.3 first before rolling this server into production. This server provides authentication for mobile home computers, email server, web servers, etc. It also is my main AFP server at the moment. It's running OD, RADIUS, AFP and SMB.

The first thing I noticed is that OD will stop working before any other service. This causes all sorts of issues in my environment. I do have another slightly older xServe running 10.6.3 (OD Replica) and that has been stable thus far, although it's not running AFP or SMB.

This morning the server was completely locked up and needed a full restart. I had a 6port Small-Tree ethernet card installed and removed it thinking the link aggregation could've been the issue. The issues still persists regardless. Other than the card, the computer is as is from Apple including the memory.

I'll be posting log information shortly. I wanted to see if others are having issues with 10.6.3. I've seen a few threads specifically related to AFP issues in 10.6.3 and it's possible AFP is the root of the issues here. I do have AppleCare support so will be calling them today as well. Wanted to get this thread out there for anyone else who might be having the same issues as me.

xServe 2x 2.66Ghz Quad-Core Intel Xeon, Mac OS X (10.6.3), SSD, 12GB

Posted on Apr 12, 2010 9:35 AM

Reply
77 replies

May 4, 2010 8:41 AM in response to qmp

10.6.3 xserve

Same thing happening here too.

Do most of you have a self-signed certificate in use with Open Directory.
I notice sasl gaspi complaining until I turn off ssl certs for Open Dir and wait a minute and then turn the ssl cert back on.

Can any of you login with ssh to these crashing servers... maybe I have multiple issues... seriously thinking of going back to Linux even though it's harder to configure ...seemed a bit more stable.

May 4, 2010 9:55 AM in response to jpbuse

I too have a new 2x2.93Ghz 24GB 1066 Ram SSD + hardware raid 6TB with 10.6.3. It too freezes up about 2 times a day. I've got 2500 student accounts and about 125 concurrent users. I updated from 10.5.8 to 10.6.3 from archive. I reinstalled the OS twice and it would even freeze up before I restored from the archive.

I had to shut off time machine and spotlight. I added all the server hard drives to the privacy tab in system preferences. Its run 2 days straight without freezing up!

Hopefully apple will get this fixed. The lab teachers are about to kill me!

May 4, 2010 11:17 AM in response to qmp

Very plausible! My server has not crashed in well over 3 weeks BUT I have not used WGM or Server Admin on it since then. I've been doing what I can to not use either app. When I first reported my crashing issues, it seemed to always happen at some point after I used either app for more than say 5+ mins. It doesn't seem to matter if I use the app locally on the server or remotely.

I haven't pulled any log info because I have not been using either of the apps and haven't had any issues. Pretty annoying though.

I do use Time Machine locally. Main drive is SSD. This is 10.6.3.

May 4, 2010 11:32 AM in response to bootzus

To add: below is a log file from my server. While I am sitting at the server I will sometimes open a terminal and the terminal will ask me for a password and I will give it my admin password the terminal window will then say that /usr/bin/bash does not exist... after this the xserver is delayed in opening new apps and frozen on currently opened apps such as server manager and workgroup manager ...and any authentication to or on the server just simply dies!

************************************************************************8
May 4 11:24:12 mercury Software Update[1771]: SWU: scan found 0 products:
May 4 11:24:18 mercury com.apple.launchd[1] (com.apple.suhelperd[1773]): Exited with exit code: 2
May 4 11:26:39 mercury /System/Library/CoreServices/CCacheServer.app/Contents/MacOS/CCacheServer[1210] : No valid tickets, timing out
May 4 11:36:09 mercury DirectoryService[29]: Misconfiguration detected in hash 'User Name' - see /Library/Logs/DirectoryService/DirectoryService.error.log for details
May 4 12:22:14: --- last message repeated 2 times ---
May 4 12:22:14 mercury login[3702]: in pam smauthenticate(): Failed to determine Kerberos principal name.
May 4 12:22:20 mercury login[3702]: 1 LOGIN FAILURE ON ttys000
May 4 12:28:04 mercury login[3965]: in pam smauthenticate(): Failed to determine Kerberos principal name.
May 4 12:28:19: --- last message repeated 1 time ---
May 4 12:28:19 mercury login[3965]: USER_PROCESS: 3965 ttys000
May 4 12:28:19 mercury login[3990]: bootstrap lookup(com.apple.ReportCrash) failure: Unknown service name (1102)
May 4 12:28:19 mercury login[3965]: DEAD_PROCESS: 3965 ttys000
May 4 12:28:46 mercury Safari[3926]: IPCClient: Server port 0 is invalid; looking it up again...
May 4 12:30:59 mercury login[4087]: in pam smauthenticate(): Failed to determine Kerberos principal name.
May 4 12:31:03 mercury login[4087]: 1 LOGIN FAILURE ON ttys000
************************************************************************8
end snip

May 4, 2010 12:24 PM in response to jpbuse

What we have found here is that even if OD has crashed / broken, the server will continue to support applications that don't rely on OD for their function. So, for example, we run a third party mail server and a couple of Windows Server apps on a virtual machine running W2K server - these continue to work just fine even after calendar, wiki, file access and so on have failed because OD is not working.

It seems this problem really is with OD / LDAP - all that varies is what you do that triggers it (though it seems Time Machine is one culprit, it is not the only one...), and how it expresses itself.

We've turned off TM (culprit on our system) and taken to doing periodic backups manually (TM will do an incremental backup on demand even if you turn it 'off'), and then restarting machine immediately afterwards. Bit brutal and labour intensive, but seems to work until the problem is fixed.

May 4, 2010 12:27 PM in response to qmp

qmp wrote:
Could someone confirm that this theorie might be plausible?


Yes - it seems to be linked to 10.6.3. Our system worked flawlessly until the 10.6.3. update, and then started all the OD problems discussed here. Others seem to report similar experience - though I have seen a couple of postings saying similar things are happening on some 10.6.2 machines.

May 4, 2010 5:13 PM in response to Gavin Lawrie

There's a grab-bag of different issues here, and some of them occurred for some people well prior to 10.6.3

For what it's worth, I have a server running 10.6.3 that has had zero crashes including OD.
So, there's nothing inherent in 10.6.3 that makes it crash-prone.

It's very important not to conflate separate issues, as that tends to lead one away from careful troubleshooting.

May 5, 2010 1:22 AM in response to davidh

davidh wrote:
So, there's nothing inherent in 10.6.3 that makes it crash-prone.


Except that the majority of people with these problems (ourselves included) only started having the problems after updating to 10.6.3.

For sure it is not clear what is causing the problems, and for sure not everyone experiences them.

But your not having problems is not, unfortunately, proof that there are no problems. Nor does knowledge that someone else doesn't have a problem much use when it comes to working out what to do to fix the problem.

It's very important not to conflate separate issues, as that tends to lead one away from careful troubleshooting.


Indeed. But right now I think people are trying to work out the characteristics of the problem, and for that discussion of the symptoms seen is about as useful as you can get.

May 5, 2010 4:50 AM in response to Gavin Lawrie

Yes, my point was that there isn't necessarily something inherent to 10.6.3 that is problematic - outside of possibly some specific issue in some specific situation(s), that could be prompted
by the update but in and of itself (10.6.3) this update is not "the" cause for any and all issues that some may be seeing.

Was the system disk verified (Disk Utility or fsck) prior to updating
Was the system rebooted prior to applying the update
Were services shut down prior to running the update

Was a known-good, full backup made prior to applying the update ?
If not, why not ? Never update a production server without a known-good, full backup to revert to.

As for problems that are occurring, pretty much each posting from each different poster, merits its own thread, to be honest.

Some people are seeing problems with Time Machine and 10.6.3, but Time Machine really is not ideal for backing up OS X Server. Many services must be cleanly stopped prior to backing them up - mail, calendar, anything using a database (many services do use sqlite dbs).

OD, incorporating Apple's implementation of OpenLDAP, also uses a database - ie take appropriate precautions when backing up.

May 5, 2010 5:26 AM in response to davidh

If you don't have any problems then that's nice for you.
Why are you then posting to this thread? Go somewhere else.
As to my case:
I did not have any problems prior to 10.6.3.
I now have frequent crashes with 10.6.3.
I made a clean install and imported the OD data from prior to the upgrade.
The crashes asre still there.
So the problem to me is clearly 10.6.3 related.
One thought though:
I installed the 'MacOSXServUpdCombo10.6.3v1.1' on all of my servers.
Maybe that could have something to do with it?

May 5, 2010 5:32 AM in response to qmp

"I made a clean install and imported the OD data from prior to the upgrade"

If at all possible, try not importing the OD data. A staging server to test & compare
would be a good step to take.

the v 1.1 update should be the newer of the 10.6.3 updates and would be the one to go with,
the update itself was updated by Apple.

And what are you seeing in the server logs ? What does a "crash" mean: what service is crashing,
what specific behavior are you seeing that should not be occurring.

May 5, 2010 5:44 AM in response to davidh

The problem is that without my 1400 users my OD is useless.
So I really don't care if the crashes also happened without users...

I posted part of my log previously.

As to the specific behavior:

I have at least four xserves with the same problems.
These are ldap, replica, mail and a file server.
All of them erratically seem to hang.
Services are then no longer available, login is no longer possible on the server itself nor is it accessible via ssh.
The only thing I can then do is to turn off the power.

I repeat though what I said before:
If I do not use Server Admin on these servers, they won't crash.
They seem to be running for days...

May 5, 2010 6:04 AM in response to qmp

"If I do not use Server Admin on these servers, they won't crash.
They seem to be running for days"

Ah-hah ! If you mean what I think you mean... 🙂 Don't leave Server Admin or Workgroup Manager running unless you need to use them - ie: quit out of them when done.

Certainly there's been a problem in the past where leaving SA/WM running could lead to problems over time.

As a rule I always only run the tools as/when needed and then quit out of them when done.

Hope I understood you correctly.

May 5, 2010 6:48 AM in response to davidh

My setup/config:
Was a brand new server with 10.6.2 pre-installed on it. Setup the server as a stand-a-lone, then upgraded via SU to latest updates (not 10.6.3 yet). Changed to OD master then imported my users from a 10.5.8 server. Server is OD master, AFP, SMB and RADIUS. Ran just fine w/o any crashes until the 10.6.3 upgrade.

I don't recall if I restarted the server before I applied the upgrade to 10.6.3. I did have a backup and yes I use TM on the server for backups although it does not backup any AFP/SMB data.

I never leave WGM/SA open on the server because they will spit out errors and ultimately crash the server. Obviously, I need to use them at times. If use them for more than X amount of time, it appears that within say 12 hours after that, the server will basically crash. My crashes usually start with OD. My external mail server and VPN authenticate to the server and those services are the ones that report the issues first. AFP is one of the last services to crap out.

So from my experience thus far, it's definitely 10.6.3 related w/o question.

FWIW - I can upgrade my linux servers all day long, including kernel updates, and never have to restart it. That said, I've never done a big version upgrade though... ie RHEL 5.3 to 5.4, I always re-install.

10.6.3 frequent crashes

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.