Newsroom Update

Beginning in May, a special Today at Apple series titled “Made for Business” will offer small business owners and entrepreneurs free opportunities to learn how Apple products and services can support their growth and success. Learn more >

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

OSX 10.6.4 Random AFP Crashes 2 -3 times a week

Hi All,

Im trying to find a solution to this problem and so all input is welcome

The problem....

Our OSX Server which is running 10.6.4 and is only doing file sharing continues to crash out and disconnect all users, the only way to recover from this fault is to reboot the server, sometime we need to hard reboot and the server doesnt respond to shh ARD or even input at the console, although this is not always the case it usually requires a hard reset to get it back up

The setup....

2 x OSX Servers running 10.6.4

Server 1:

Open Directory
SoftwareUpdate
NetBoot
NFS

ARD Task Server
QLA Server
DeployStudio Server

Server 2 -

AFP
SAMBA

Server 2 has 2 Gig ethernet ports bound as one virtual interface and is bound to server 1 as a client only and runs nothing else other than AFP

Server 2 host the DeployStudio Share along with homedrives and sharedrives

Our clients login to the machines using there OD account so when ever the server crashes the machines lock up and need to be hard reset to get them up and running again.

We have roughly 250 MAC's and around 400 users in the OD each client machine is connected at a Gig

Testing and things we have tried to resolve the issue...

We had tested osx 10.6.4 for around 2 months before moving it into productions, we rolled out all our client images with this server with no signs of anything wrong, the network through put would be more at least that of a normal work day. the only difference is that on a work day we would have around 200 server connections compared to around 50 or so when we do our images.

We also noticed that other day that when it crashed and we actually able to remote to the server that it was fine at around 3:00 then it suddenly thought it had 7000 connections (yes 7000) and it also thought it was doing 1000MB/sec throughput (yes thats right 1000MB/sec) this continued for around 10 mins then the server crashed.

I am at a bit of a loss to understand why this crashed and I have been trolling the logs to find a solution but so far nothing has turned up, I did notice on the day mentioned above there was a crash log for AFS which give the thread that fails but nothing that i can make any sense of.

here is the start of the log..

Process: AppleFileServer [241]
Path: /System/Library/CoreServices/AppleFileServer.app/Contents/MacOS/AppleFileServer
Identifier: AppleFileServer
Version: ??? (???)
Code Type: X86-64 (Native)
Parent Process: launchd [1]

PlugIn Path: /usr/sbin/AppleFileServer
PlugIn Identifier: AppleFileServer
PlugIn Version: ??? (???)

Date/Time: 2010-08-05 15:58:34.086 +0930
OS Version: Mac OS X Server 10.6.4 (10F569)
Report Version: 6

Exception Type: EXC BADACCESS (SIGSEGV)
Exception Codes: KERN INVALIDADDRESS at 0x0000000000000000
Crashed Thread: 141

it then shows a total of 202 threads with a hole bunch of hex at the end i can provide the entire log if you need it.

also the server does not drop any pings and anytime during the crashes

I would apreaciate any help with this issue and also even other people experience with AFP does this sort of stuff happen to you? or is it just me?

thanks in advanced

Message was edited by: kyleh0000

Mac OS X (10.5.4)

Posted on Aug 5, 2010 10:17 PM

Reply
27 replies

Sep 7, 2010 4:22 PM in response to pcolvin15

Hi All thanks for you interest in this topic i know MANY other are having the same issues and the simple fact is apple have not fixed this issue and are not giving us any indication of when the issue will be fixed.

I understand that other user have not experienced this issue as it seems to be more aparent with enterprise bussiness running large scale operations.

however smaller operations are also affected in different ways, i would be suprised if you havent experienced this issue, however I have found that sometime you wont even notice it. For example we had a crash the other days and all that happened was everyone got the spinning wheel of death for a few minitues and the server picked up exaclty where it left off with no problems and no interaction from ICT staff, sometimes it logs in the console and others it doesnt.

The problem becomes particualy aparent when users are logging in using network accounts ie the desktop is on the server, this means when the server crashes the desktop and the machine locks up, they we are able to restart the AFP service and usualy everything picks up again fine.

So if your a small operation i would expect that you login to the machine localy and when the crash orrcurs the few people that use it would only notice the problem if at that particual time they were moving data from the local machine to the server and even then the file copy my just pause for a few seconds the users would not even notice.

When you start to scale this up it seems that the more clients are connection the more frequent the crashes and the less clients are able to recover from such a crash, this become particualy apparent when you use network accounts as we do.

so far people have found 3 work around for the problem...

One is to login local to reduce the load on the servers, and reduce the severity and frequency of the crashes

2nd is to buy a obsurde amount of hardware and split the load across the servers, to again reduce the frequency of the crashes, i supspect this is why apple are in no rush to fix the issues

and the last option is to use Linux file server, which is I think what we will do for a long term solution, I have found plenty of guides on the internet on how to get this done and even a company tha provides offical support for doing this.

I relaise this mean we will get no support from Apple if we decide to go down this path, but as apple support is not worth the paper its written on i dont see this as being and issue.

However we are seeking anyone with experience who has gone down this path to see how successfull its been so far we have created a Proff of concept server and its working great, we are just starting to do load testing now on it.

so if anyone has done this before please let me know how it has gone, it would be great help!

Sep 23, 2010 5:06 AM in response to kyleh0000

Hi

Were in the same boat. Randomly afp connections will multiple out of control, hitting anywhere between 3000 to 7000 connections!

I've disabled spotlight on all shares and volumes, repaired prefs etc. But still the same

This only started after the 10.6.4 update. Never had any problems with 10.6.2. We skipped 10.6.3 after reading a few stories on here.

Is there an easy roll back method?

Sep 26, 2010 3:54 PM in response to Lee Howard1

sorry we had to do a complete rebuild to 10.6.2, however this did NOT fix the issue we are still getting just as many if not more crashes and previouly we running ok.

The rebuild was not a big issue for us as the only service this was running was AFP.

As for spotlight and time machine I have turned this off and they have been off for a while but neither seem to help.

The linux server solution is coming along nicly and we are about to move all staff over to the linux solution.

for now all users are just logging in localy untill we can prove the linux solution is stable enough for everyone to move to.

Oct 11, 2010 1:15 AM in response to Ben Pirozzolo

Ok

Apple have confirmed there is a bug in 10.6.4 which is causing these AFP crashes.
However they could not suggest a work around or confirm it would be fixed in 10.6.5
Best ETA for 10.6.5 was within the next 6 months.

Our server doesn't use binded ethernet ports, so I don't think that is related.

This weekend we've rolled our server back to 10.6.1. This version has previously run for over a year without even a reboot. So I'll see how it goes this week...

Lee

Oct 22, 2010 1:13 AM in response to Jackoncept

Update...

Our rollback to 10.6.1 worked for 10 days and then the fault reappeared 😟

It looks like to me that the afp 'bug' is also in 10.6.4 client. And its the client machines causing the server to trip up. They seem to loose AFP connection and auto reconnect, creating hundreds of ghost connections on the server per user

See here http://efreedom.com/Question/2-109676/Preventing-Ghost-AFP-Mount-Points

He's running a 10.5 server with 10.6 clients and getting the same thing. So surely this points to the client OS

We've only had this problem after updating the server and clients to 10.6.4.

So my options are rollback all the clients to 10.6.1 which I know previously ran for a year, or wait until and hope 10.6.5 will fix the issue

My guess is that 10.6.5 will be out within the next 90 days as it will integrate with the OS X App Store announced on Wednesday

regards

Lee

Oct 26, 2010 5:03 PM in response to Lee Howard1

Hi all
Great to see another discussion on this issue. I had contributed to "10.6.3 frequent crashes" http://discussions.apple.com/thread.jspa?messageID=12337042#12337042 but I get an error when I link to it (strange) so I discovered this discussion which is my situation since May at my school. Xserve with 200-300 connections usually. Issues started with 10.6.3 (10.6.2 worked fine). Crashes 3 or 4 times a day 1000s of duplicated Ghost connections. After 3 or 4 weeks Coped by turning off timeMachine and not using any programs on the server(as discussed on the previous discusssion). Reasonably stable but one particular period of the school day would consistently cause the issue.

Tried Apple - sympathetic but no assistance. Rolled back to 10.6.2, fresh install of 10.6.2 10.6.4 etc - problem still there today. Using time machine to recover a file crashed afp immediately - (I run time machine manually outside of school time occasionally to back up students docs.)- Using server admin on server crashes afp within an hour or 2.

Today was a new thing. I inserted a DVD to copy it into a share and afp died within 10 secs of insertion.- no one could login!- no shares available. Shows quite clearly on the logs. I also notice that iphoto 09 is incredibly heavy on afp usage.

I had not thought of 10.6.4 clients adding to the problem. I have about 10 that are upgraded from 10.6.2. Maybe that does add to the problem. 50% of our machines are 10.5.8. maybe that doesnt help either.

Oct 26, 2010 5:17 PM in response to Vern Dempster

Just wanted to update everyone, We are still having problems with OSX Server.

However we have now been running SLES 11 with Netatalk for around 50 users as test with the intention to ditch OSX completely at the start of next year. So far we had have 0 major issues, there have been a few but they have all been related to the migration of data from the HFS file system to ext3. But as far as speed and stability goes its a HUGE HUGE HUGE improvement over Apple, altough thats not had because beacuse the Apple server OS just basicly doesnt work, the Client software works just not very well.

We have been running SLES from around 3 weeks now, and we are getting more and more users asking to be put on the server after the positive feedback from the original test group.

Im sure we will be testing 10.6.5 when it eventualy gets released how even if it does fix the issue (and i dont think it will) we will probably still be running with linux.

If anyone wants any help moving away from apple servers and software please let me know and i can provide some help building the linux server.

cheers

OSX 10.6.4 Random AFP Crashes 2 -3 times a week

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.