Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

10.6.3 frequent crashes

I've got an xServe 2.66Ghz w/ SSD and 12GB ram running 10.6.3 server. This is a new server and was running 10.6.2 for 3-4 weeks before I upgraded to 10.6.3. I made the mistake of not testing 10.6.3 first before rolling this server into production. This server provides authentication for mobile home computers, email server, web servers, etc. It also is my main AFP server at the moment. It's running OD, RADIUS, AFP and SMB.

The first thing I noticed is that OD will stop working before any other service. This causes all sorts of issues in my environment. I do have another slightly older xServe running 10.6.3 (OD Replica) and that has been stable thus far, although it's not running AFP or SMB.

This morning the server was completely locked up and needed a full restart. I had a 6port Small-Tree ethernet card installed and removed it thinking the link aggregation could've been the issue. The issues still persists regardless. Other than the card, the computer is as is from Apple including the memory.

I'll be posting log information shortly. I wanted to see if others are having issues with 10.6.3. I've seen a few threads specifically related to AFP issues in 10.6.3 and it's possible AFP is the root of the issues here. I do have AppleCare support so will be calling them today as well. Wanted to get this thread out there for anyone else who might be having the same issues as me.

xServe 2x 2.66Ghz Quad-Core Intel Xeon, Mac OS X (10.6.3), SSD, 12GB

Posted on Apr 12, 2010 9:35 AM

Reply
77 replies

Jun 15, 2010 7:41 PM in response to MattMPS

MattMPS wrote:
I finally have proof of the problem and know what causes 10.6.3 server to crash, it is DirectoryServices. I have also gotten an Apple engineer to admit that there is a DirectoryServices bug in 10.6.3 server. The thread count will clime as more Services access DirectoryServices till the server is useless. to fix just kill DirectoryServices and it will restart it self, the server will be all good for a while.

See pic from my phone of crashed server
http://www.freeimagehosting.net/uploads/a5ef32e799.jpg


We are seeing the sane thing too when looking at Activity Monitor after a crash.
How can we "just kill DirectoryServices" when the server is crashed?
Let's all hope that the recently released 10.6.4 fixes the random crashes of 10.6.3.

Jun 17, 2010 11:58 AM in response to glisse

Well 10.6.4 does not fix this problem I upgraded 14 out of 21 of our Snow Leopard servers to 10.6.4. 4 out of 14 crashed within 24 hours of the upgrade with the same problem the thread count of DirectoryServices will grow till the server locks up. The only way to kill directoryServices when the server is locked up is to have physical access to it and have the Terminal or Activity Monitor open before the crash. I have been playing with a sad hack, a plist that kills DirectoryServices every hour. Someone that is good at scripting could maybe write a script that would watch the thread count on DirectoryServices and kill it if it goes over some number or Apple could just fix it.

try at your own risk
Place plist in /Library/LaunchDaemons and restart server. It may help or may not I can not tell yet.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Disabled</key>
<true/>
<key>Label</key>
<string>com.apple.DirectoryServiceKill</string>
<key>ProgramArguments</key>
<array>
<string>sudo</string>
<string>killall</string>
<string>-9</string>
<string>DirectoryService</string>
</array>
<key>StartInterval</key>
<integer>3600</integer>
</dict>
</plist>

Jun 21, 2010 5:49 PM in response to MattMPS

On the DirectoryService Threads count issue, Activity Monitor shows it generally somewhere between 5 and 10 when all is well. Unfortunately, I haven't been looking at Activity Monitor when the Threads Count goes spiralling up above 500 so I don't know if the count goes up slowly or if it jumps up in one big step.
When our server dies, we can't quit the DirectoryService process from Activity Monitor even when we have physical access to the server.
Please Apple fix this issue.

Jun 23, 2010 4:17 AM in response to MattMPS

Trying 10.6.4 today. Fingers crossed. I have eleven clients running 10.6.3 server with no problems and one with severe problems as described by the users in this post. We even tried migrating the OD to another machine but it ended same result. The only difference I can see with this client is that they have a FileMaker Server on another machine that uses OD for authentication to the database.

Anybody else got FileMaker Server running?

Jun 24, 2010 8:28 AM in response to glisse

So far this is what my particular problem was, and I'm not sure it will help any of you. My Open Directory Server has been up and running with out any crashes for I think 2-3 weeks now (sorry can't remember the last time I posted, I really do lose track of time here).
The problem I had kerberos trying to authenticate to 127.0.0.1 on the local machine through the local machine WorkGroup Manager ...now I have my connection to OpenDirectory going out to the network and then back in.... everything is hunky dory so far. The only thing that is strange kerberos says it is stopped, but I'm assuming it is because it only runs when a connection is requested, because kerberos tickets are still working... go figure.

Jul 6, 2010 12:19 PM in response to jpbuse

Hi all,

I’ve been tracking this problem for a while now and there seem to be three prominent concurrent threads that are all alluding to the same problem.

There has also been some interesting debate so personally I think the forums have been very interesting and there is a lot of good info going around.

For background I’m running two systems:

1 Mac pro 2009, SL server (AFP, iCal, iChat, SMB, OD, NFS, SWU, Web, DNS, AB, Print, VPN, FTP, Mail, MA, NB, SQL and Push) Raid 5 and Drobo Pro on firewire. 3 2008 Mac Mini, 27 iMac, vista media centre, Macbook pro, airport extreme. This is my home/dev/test rig.

2 2008 xserve, SL server (DHCP,OD,AFP,SMB,DNS,SW,NB,Web,ical,VPN) Raid 5 and Drobo Pro on Firewire. 11 2008 iMacs, 1 XP Bootcamp iMac, airport extreme, Windows 2008 terminal server(Dell, yuk). HP printers connected locally. This is our office rig.

Upgraded to SL Server and moved to 10.6.2, apparently all running smoothly with no problems.

10.6.3 comes along and I (foolishly) jump into upgrading both clients and server, all was well for 3-4 days then all **** breaks loose. All the clients print queues began jamming and then total loss of the server at least once if not twice a day. I’ll park the print issue for now. The server crash would start with total failure of AFP shares (not the SMB, strange as they same AFP was shared SMB to the windows client) los of directory server connection no VPN or remote connections and management tools (server admin/WGM) it would also screw up LOM on the xserve. Interestingly no apparent problems on my home rig, but this get used far less.

There was a problem during the upgrade which broke the iChat server, so I thought the problems was a screwed up install. So I did a fresh install on to 10.6.3 and imported the OD. Fine for two/three days then the crashes started again. No problems on my home rig.

The crashes got too much as 10.6.4 was released, however on some of the other threads they reporting no improvement with 10.6.4. So having discussed with the office manager I decided to rebuild the whole lot, OD and all.

So now we I have a stable 10.6.2 (server and clients) printing as back to normal (never got to debug that one) and I have started looking into this further.

I have also upgraded my home rig to 10.6.4, which apart from some iCal authentication failures on the new IP4 (which despite all the hype has a great signal and is awesome) which have not bee repeated on my iPad so I’m putting it down to an iOS 4 issue, appears totally stable.
So I have been looking at the 10.6.2 setup and looking at the logs closer. Low and behold numerous crash reports, every hour like clockwork.

SLAPD crashes every time Timemachine runs the prebackuphook, also verified by manually starting time machine. Log entry:

04/07/2010 09:30:08 com.apple.backupd[84680] Starting standard backup
04/07/2010 09:30:09 com.apple.backupd[84680] Backing up to: /Volumes/Drobo/Backups.backupdb
04/07/2010 09:30:09 servermgrd[52431] servermgr_backup: TimeMachinePreBackupHook called.
04/07/2010 09:30:10 com.apple.ReportCrash.Root[84687] 2010-07-04 09:30:10.318 ReportCrash[84687:2803] Saved crash report for slapd[84686] version ??? (???) to /Library/Logs/DiagnosticReports/slapd 2010-07-04-093010localhost.crash
04/07/2010 09:30:12 hdiejectd[84728] running


However, the whole network holds up and no fall over. I have tried the voodoo and excluded /etc/var and then the whole system drive. Still the crashes, the only time they stop is but totally disengaging TM. Disabling spotlight also has no effect.

I have looked on my server 10.6.4 and no crash reports and the point TM is called.

I use SA/WG/LOM daily both onsite and remotely via VPN, and it’s fine with 10.6.2 and 10.6.4 so any reference to these being the culprit is in my opinion more voodoo (I also stopped kicking the neighbours black cat crashes didn’t stop (Joke)).

TM is the culprit, and I think Apple tried to fix this in 10.6.3 (according to the release notes) and I think the fix is causing a greater expression of the original bug or causing more problems that it solved. Stop TM and you’ll stop the crashes.

I am eager to find out a few things:

1 What do the crash reports look like in 10.6.3, are the SLAPD crashes at the time the TM prebackup hook called?

2 Is 10.6.4 still causing problems?

Sorry to waffle, this has been beating me up a lot! Not bashing the voodoo, we all fall into it when we can’t explain whats going on and a big thanks to all who have contributed so far!

Best Eggs.

Jul 7, 2010 2:37 PM in response to jpbuse

My continuing problems with Snow leopard Server on Xserve 2009.

No problems on 10.6.2. The 10.6 3 upgrade created all sorts of problems to student logins with managed users. The main symptom was an explosion of afp users from the usual 200-300 to thousands with each user instead of the usual 2 instances exploding to 10s or 100s. Initially more frequent crashes due to monitoring the server with server manager on the the actual server. Once I stopped that the 2 or 3 crashes per day stopped.
The server held together with infrequent crashes even with the 10.6.4 update. Another symptom was frequent DNS deafness which caused our internet users to not be able to make internet connnections. So by setting up a backup DNS on my stable 10.6.2 SLS webserver and only using server & workgroup manager on the same server, I limped through the term with no great confidence.

End of the term, the afp server suddenly went crazy with 6000 users on it with 100s of instances of the same user and unresponsive to student logins. A reboot seemed to fix this and we ended the term with me not very confident for the prospects for next term in 2 weeks time.

I too was using the print server OK on 10.6.2. but after major problems with our Art students HP 4600s print queues following 10.6.3 and 10.6.4, we have removed them from it and have gone back to direct ip printing which is much more reliable. The DNS is still probably not working all the time but it is hard to tell as most have the backup server covering for it.

So I still have my MacMini SLS 10.6.2 web server with absolutely NO crashes and my powerful Xserve 2009 running at only 10% capacity and frequent crashes and instablities, missing users in workgroup manager and DNS issues.

It would be great if Apple or someone would work out how we can solve these problems so then I can work out how to make my xserve more responsive (ie faster logins for students) and able to use some of the other 90% CPU capacity. if anyone has any solutions to how I can improve the network - login performance off topic, it would be greatly appreciated.

Jul 14, 2010 9:44 AM in response to Vern Dempster

So my xServe that I started this thread about has been running just fine for the past 6-8 weeks. Well, that is until yesterday I did the unthinkable and opened WGM on the local server to add a new user. I know it's asking for a lot right? I added the new user then immediately quit WGM. Within 24hrs I had a complaint from a user who connects over SMB that his system wasn't working. Turns out that SMB stopped working completely so simply restarted the SMB service. This did nothing at all. I also noticed that a new laptop which I was trying to connect and manage via MCX was not joining correctly. Also, at this point when I tried to open WGM on the local xServe it failed and it said it could not connect. Same for Server Admin.

So, I was forced to wait until after hours and restart the server. This fixed the SMB problem for now. My new laptop still cannot join correctly and therefor I am unable to add the new to the laptop (mobile home account) and manage it correctly. I updated to 10.6.4 even though I've read it won't fix anything. I figured 10.6.3 is basically useless to me so can 10.6.4 be any worse? I'll keep everyone posted as to what I find and what Apple says about this issue.

In all seriousness I think Apple's success with the iPhone and iPad has severely taken away from it's other products, specifically OS X Server. I've never been so disappointed and frustrated with Apple software before as I am with Snow Leopard server. I'm not trying to do anything fancy with these servers and the installs are basic in nature. The only services I am running on the xServe(s) are AFP, SMB, RADIUS and Open Directory. I'm looking at using a Linux platform (CentOS) to run SMB, Radius and perhaps even OpenLDAP moving forward although the Mac GUI tools for OD are pretty nice, when they work. The fact is I don't trust Mac OS X server anymore and can't rely on it like I do my CentOS servers.

I'll keep everyone posted if I hear from Apple and/or if there is a fix or alternative path available.

Jason

Sep 12, 2010 3:21 PM in response to jpbuse

Continuing issues. 2 weeks ago, over the weekend did a complete clean 10.6 server reinstall- upgraded to 10.6.4, ditto copied mail and restored ldap users from backup.( can do it in my sleep now) A wonderful faster server- all working well for 10 days and now back to crashing regularly ie This is a much cleaner crash directly attributed to afp issues - users lose connection to their home folders in the users folder. the server keeps working but must be restarted to fix user folders. Please can Apple or any of you offer thoughts. This is a 4 month nightmare that makes apple servers look second rate. My 10.6.2. web server is still running flawlessly. I have contacted support and got 3 sympathetic listeners but no solution forth coming and no followup at all from support.

Sep 12, 2010 5:26 PM in response to Vern Dempster

Ok. I am hoping this is not just pure luck. Taking cue from our newly purchased / installed Xserve, where the main LAN card has 127.0.0.1 as the initial DNS, followed by the two real DNS. No crashing after repeated use of WGM and/or SA.

I did the same to our always crashing mac mini server and so far, it is still up and running after several session with the WGM / SA.

Keeping my fingers crossed :$

10.6.3 frequent crashes

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.