Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

10.6.3 frequent crashes

I've got an xServe 2.66Ghz w/ SSD and 12GB ram running 10.6.3 server. This is a new server and was running 10.6.2 for 3-4 weeks before I upgraded to 10.6.3. I made the mistake of not testing 10.6.3 first before rolling this server into production. This server provides authentication for mobile home computers, email server, web servers, etc. It also is my main AFP server at the moment. It's running OD, RADIUS, AFP and SMB.

The first thing I noticed is that OD will stop working before any other service. This causes all sorts of issues in my environment. I do have another slightly older xServe running 10.6.3 (OD Replica) and that has been stable thus far, although it's not running AFP or SMB.

This morning the server was completely locked up and needed a full restart. I had a 6port Small-Tree ethernet card installed and removed it thinking the link aggregation could've been the issue. The issues still persists regardless. Other than the card, the computer is as is from Apple including the memory.

I'll be posting log information shortly. I wanted to see if others are having issues with 10.6.3. I've seen a few threads specifically related to AFP issues in 10.6.3 and it's possible AFP is the root of the issues here. I do have AppleCare support so will be calling them today as well. Wanted to get this thread out there for anyone else who might be having the same issues as me.

xServe 2x 2.66Ghz Quad-Core Intel Xeon, Mac OS X (10.6.3), SSD, 12GB

Posted on Apr 12, 2010 9:35 AM

Reply
77 replies

Apr 12, 2010 7:31 PM in response to gsfunkarch

I haven't had the chance yet to grab log files from the server but will shortly. One interesting thing is that my OD replica server, which is an older model xServe Intel, is also running 10.6.3 and has been stable since the install. Both servers were clean installs not upgrades. My replica server, at the moment, is only running OD and not AFP and other services.

Apr 21, 2010 2:20 PM in response to jpbuse

I think this crashing problem has existed for all versions of 10.6 server or it has for us. We have 18 Snow Leopard XServes running 10.6.3, I can not get any of them to work longer then 2 weeks with out crashing, some crash daily. I was hoping 10.6.2 would fix it and it did not same for 10.6.3. All servers are fresh installs, I even have 4 new Xserves with the OS installed from Apple and they all have the same problem. I have called Apple, had our SE out, had an Apple Integration specialist out and the best they could come up with was write a script to restart the servers every night. That does not even work because the crashes are so random. My current working theory is it has something to do with disk IO and Kerberos, if the servers are just running OD they do not crash as often but we need AFP and SMB so that is not an option. One thing I have found in the logs that no one I have talked to can tell me about is this. It shows up in the logs each time a user connects to a share.

Apr 21 15:15:29 server-safp-01 gssd[4510]: Minor error = 100006:
Apr 21 15:15:29 server-safp-01 gssd[4510]: Error returned by svc mach_gss_init_seccontext:
Apr 21 15:15:29 server-safp-01 gssd[4510]: Major error = 851968: Unspecified GSS failure. Minor code may

If anyone has any Idea how to make Snow Leopard Server more stable let me know

Apr 21, 2010 7:43 PM in response to jpbuse

We have 4 xserves running, 1 is the OD master, the other 3 are OD replicas. They have all been randomly hanging. The hanging only started after updating them all to 10.6.3. Before the update to 10.6.3 they were running 10.5.8 in the same OD master/replica arrangement, all running very stable.
I did not upgrade the OS, I performed a clean (erase and install) installation. I did not import my 10.5.8 OD information, I setup a fresh clean OD system.

I have been leaving ssh sessions open on the servers and have been able to remain connected during the 'hang', however the output of most commands run during the hang is sporadic at least - is:

xserve1:/ root# reboot
-sh: /sbin/reboot: Permission denied
xserve1:/ root# tail /var/log/system.log
xserve1:/ root# ls -la /sbin/reboot
-r--r--r--@ 2433 root 264579 0 Apr 9 09:57 /sbin/reboot
xserve1:/ root#


When server is running ok:
xserve1:~ root# ls -la /sbin/reboot
-r-xr-xr-x 2 root wheel 55760 Feb 11 19:58 /sbin/reboot

As you can see, the uid mapping is completely screwed and the filesystem is reporting the reboot binary as being 0b in size! All commands above were run as root.

I have created a case with Apple support but they have reported no other reported issues with 10.6.3 server.
I have also disabled all 3rd party software running on these servers - radmind, networker, deploystudio. And even unbound the servers from active directory - as apple support informed me I would have to pay a fee to get my issue looked at since the machines were bound to AD.
Even after disabling all these 3rd party (AD plugin???) apps, the servers still freeze/hang.

I have tried rebooting them via LOM which has mostly no result, every now and them a LOM restart works but the majority of the time it does not.

Services currently running on the servers are:
AFP - some file sharing - NOT home directories.
NETBOOT
OPENDIRECTORY (kerberos disabled).

Apr 22, 2010 8:11 AM in response to Francois Herbert

Hey All,

I to am having the same issue with the server randomly hanging. The only fix is it just press the power button. The server is running OD/AFP/Netboot/DHCP/Nat/Firewall and only has 36 client machines and has 1200 user accounts but only a few login at one time. This is also running 10.6.3 on a brand new Xserve with SSD and 3TB Raid.

Your help would be fantastic Apple 🙂

Apr 23, 2010 8:05 AM in response to Francois Herbert

I opened a case about this in November with Apple (Case ID 143620565), the problem has always existed in 10.6. I think it is starting to become apparent now that more people are moving to Snow Leopard Server. The problem has to do with file sharing lode, the more connections the faster it crashes. We have moved most of our shares off the 10.6 servers because of this problem and lack of a solution by Apple.

Apr 23, 2010 5:46 PM in response to MattMPS

I have managed to get the output of dmesg during a system hang:
0 \[Level 3] \[ReadUID 0] \[Facility com.apple.system.fs] \[ErrType IO] \[ErrNo 6] \[IOType Read] \[PBlkNum 18901936] \[LBlkNum 0] \[FSLogMsgID 313908054] \[FSLogMsgOrder First]
0 \[Level 3] \[ReadUID 0] \[Facility com.apple.system.fs] \[DevNode /dev/disk0s2] \[MountPt > \[Path /private/var/log/asl/StoreData] \[FSLogMsgID 313908054] \[FSLogMsgOrder Last]
disk0s2: media is not present.

Considering disk0s2 is my system partition then I can see why this is causing issues!

After talking with some of the other admins that use server admin tools remotely on this server, it appears that the issue 'may' be related to using workgroup manager remotely. This particular admin had upgraded to 10.6.3 server admin tools using software update, but whenever adding a new plist for a group, it would cause this error (along with the following error in WGM "[MCXAdmin] Debug Message Unexpected error. File:/SourceCache/WorkgroupManagerPreferences/WorkgroupManagerPreferences-437/L owLevelEditor/PrefDetailsPI.mm Line:1542")

This dmesg error would appear on the OD master first, then trickle through to the OD replicas, rendering them all useless.

It's rather concerning that adding a plist to WGM would cause this sort of kernel level file I/O issue. We have since reinstalled the serveradmin tools on the 'suspect' desktop and are waiting for the system to fail again.

Message was edited by: Francois Herbert - Had to escape the square brackets in the dmesg log so they would appear

May 3, 2010 11:51 PM in response to jpbuse

Same here.
New xserves, one OD server, one replica, one mail server often crash several times a day without any reason.
From what I see in system.log, the last entries before a crash are always from 'servermgrd'.
Over the last weekend I made sure that no 'Server Admin' application was running on any machine, including my admin workstation.
Guess what: No crash for three days.
Yesterday I had to make some changes using 'Server Admin' and the crashes happended again.

From my point of view, this issue is somehow related to 10.6.3.
I did not have any problems when working with 10.6.2 on the same machines.

Could someone confirm that this theorie might be plausible?

May 4, 2010 4:25 AM in response to jpbuse

What I am seeing in my system log is:

May 4 10:49:33 ldap servermgrd[70]: Allocated size has grown to 3M. Number of allocations is -4980. Exiting to clear possible memory leak.
May 4 10:49:33 ldap com.apple.launchd[1] (com.apple.servermgrd[70]): Exited with exit code: 12
May 4 10:49:33 ldap servermgrd[21158]: servermgr_accounts: got error 5300 trying to auth to local LDAP node

10.6.3 frequent crashes

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.