OSX 10.6.4 Random AFP Crashes 2 -3 times a week
Im trying to find a solution to this problem and so all input is welcome
The problem....
Our OSX Server which is running 10.6.4 and is only doing file sharing continues to crash out and disconnect all users, the only way to recover from this fault is to reboot the server, sometime we need to hard reboot and the server doesnt respond to shh ARD or even input at the console, although this is not always the case it usually requires a hard reset to get it back up
The setup....
2 x OSX Servers running 10.6.4
Server 1:
Open Directory
SoftwareUpdate
NetBoot
NFS
ARD Task Server
QLA Server
DeployStudio Server
Server 2 -
AFP
SAMBA
Server 2 has 2 Gig ethernet ports bound as one virtual interface and is bound to server 1 as a client only and runs nothing else other than AFP
Server 2 host the DeployStudio Share along with homedrives and sharedrives
Our clients login to the machines using there OD account so when ever the server crashes the machines lock up and need to be hard reset to get them up and running again.
We have roughly 250 MAC's and around 400 users in the OD each client machine is connected at a Gig
Testing and things we have tried to resolve the issue...
We had tested osx 10.6.4 for around 2 months before moving it into productions, we rolled out all our client images with this server with no signs of anything wrong, the network through put would be more at least that of a normal work day. the only difference is that on a work day we would have around 200 server connections compared to around 50 or so when we do our images.
We also noticed that other day that when it crashed and we actually able to remote to the server that it was fine at around 3:00 then it suddenly thought it had 7000 connections (yes 7000) and it also thought it was doing 1000MB/sec throughput (yes thats right 1000MB/sec) this continued for around 10 mins then the server crashed.
I am at a bit of a loss to understand why this crashed and I have been trolling the logs to find a solution but so far nothing has turned up, I did notice on the day mentioned above there was a crash log for AFS which give the thread that fails but nothing that i can make any sense of.
here is the start of the log..
Process: AppleFileServer [241]
Path: /System/Library/CoreServices/AppleFileServer.app/Contents/MacOS/AppleFileServer
Identifier: AppleFileServer
Version: ??? (???)
Code Type: X86-64 (Native)
Parent Process: launchd [1]
PlugIn Path: /usr/sbin/AppleFileServer
PlugIn Identifier: AppleFileServer
PlugIn Version: ??? (???)
Date/Time: 2010-08-05 15:58:34.086 +0930
OS Version: Mac OS X Server 10.6.4 (10F569)
Report Version: 6
Exception Type: EXC BADACCESS (SIGSEGV)
Exception Codes: KERN INVALIDADDRESS at 0x0000000000000000
Crashed Thread: 141
it then shows a total of 202 threads with a hole bunch of hex at the end i can provide the entire log if you need it.
also the server does not drop any pings and anytime during the crashes
I would apreaciate any help with this issue and also even other people experience with AFP does this sort of stuff happen to you? or is it just me?
thanks in advanced
Message was edited by: kyleh0000
Mac OS X (10.5.4)