13600 Views Previous 1 2 Next 27 Replies Latest reply: Oct 26, 2010 5:17 PM by kyleh0000 Go to original post
I have two mac-mini's running SNL 10.4.6, with 4G of RAM. One is set up as the OD master, and with DHCP, DNS, print service, Web, and WINS. AFP and SMB are active for it's own shares. The other is set up as an OD replica and runs AFP, iCal, iChat, Mail, SMB, SUS, and is the Web/Wiki master. It supplies the file services using a DroboPro under iSCSI w/4TB of disk space. We used to use a LaCie 4Quad, but found that the firewire interface would quickly become I/O bound and cause us to sustain a number of volume directory crunches and side issues until we moved to the Drobo (such as you can't mount it in single user, can't run Disk Warrior on it is safe mode, etc.).
I've never had an AFP crash, and I've been up since October, and have never heard of the AFP/OD bug you mentioned. I only support 7 users and support a small amount of streaming video to local clients. My clients are a mix of iMac, MBPs, a couple of Dell PC's w/XP, and a Win2K3 server set as a standalone. My clients also run Parallels 5 and XP.
I support outside VPN through Equinux VPNTracker, iPhone access, and site-to-site VPN connectivity using SonicWall TZs.
The largest issue I've hit has been repeated LDAP corruption, which has me taking to destroy and rebuild it every so often (or when my diradmin suddently can't log in).
Hi All thanks for you interest in this topic i know MANY other are having the same issues and the simple fact is apple have not fixed this issue and are not giving us any indication of when the issue will be fixed.
I understand that other user have not experienced this issue as it seems to be more aparent with enterprise bussiness running large scale operations.
however smaller operations are also affected in different ways, i would be suprised if you havent experienced this issue, however I have found that sometime you wont even notice it. For example we had a crash the other days and all that happened was everyone got the spinning wheel of death for a few minitues and the server picked up exaclty where it left off with no problems and no interaction from ICT staff, sometimes it logs in the console and others it doesnt.
The problem becomes particualy aparent when users are logging in using network accounts ie the desktop is on the server, this means when the server crashes the desktop and the machine locks up, they we are able to restart the AFP service and usualy everything picks up again fine.
So if your a small operation i would expect that you login to the machine localy and when the crash orrcurs the few people that use it would only notice the problem if at that particual time they were moving data from the local machine to the server and even then the file copy my just pause for a few seconds the users would not even notice.
When you start to scale this up it seems that the more clients are connection the more frequent the crashes and the less clients are able to recover from such a crash, this become particualy apparent when you use network accounts as we do.
so far people have found 3 work around for the problem...
One is to login local to reduce the load on the servers, and reduce the severity and frequency of the crashes
2nd is to buy a obsurde amount of hardware and split the load across the servers, to again reduce the frequency of the crashes, i supspect this is why apple are in no rush to fix the issues
and the last option is to use Linux file server, which is I think what we will do for a long term solution, I have found plenty of guides on the internet on how to get this done and even a company tha provides offical support for doing this.
I relaise this mean we will get no support from Apple if we decide to go down this path, but as apple support is not worth the paper its written on i dont see this as being and issue.
However we are seeking anyone with experience who has gone down this path to see how successfull its been so far we have created a Proff of concept server and its working great, we are just starting to do load testing now on it.
so if anyone has done this before please let me know how it has gone, it would be great help!
Were in the same boat. Randomly afp connections will multiple out of control, hitting anywhere between 3000 to 7000 connections!
I've disabled spotlight on all shares and volumes, repaired prefs etc. But still the same
This only started after the 10.6.4 update. Never had any problems with 10.6.2. We skipped 10.6.3 after reading a few stories on here.
Is there an easy roll back method?
sorry we had to do a complete rebuild to 10.6.2, however this did NOT fix the issue we are still getting just as many if not more crashes and previouly we running ok.
The rebuild was not a big issue for us as the only service this was running was AFP.
As for spotlight and time machine I have turned this off and they have been off for a while but neither seem to help.
The linux server solution is coming along nicly and we are about to move all staff over to the linux solution.
for now all users are just logging in localy untill we can prove the linux solution is stable enough for everyone to move to.
I can only report that i am having the same problem.
However can i assume that everybody who is having this problem has binded ethernet ports? I have our 2 onboard XServe Ethernet ports binded to one virtual connection. This has caused many a strange problem in the past, i'm guessing this is another.
Apple have confirmed there is a bug in 10.6.4 which is causing these AFP crashes.
However they could not suggest a work around or confirm it would be fixed in 10.6.5
Best ETA for 10.6.5 was within the next 6 months.
Our server doesn't use binded ethernet ports, so I don't think that is related.
This weekend we've rolled our server back to 10.6.1. This version has previously run for over a year without even a reboot. So I'll see how it goes this week...
Our rollback to 10.6.1 worked for 10 days and then the fault reappeared
It looks like to me that the afp 'bug' is also in 10.6.4 client. And its the client machines causing the server to trip up. They seem to loose AFP connection and auto reconnect, creating hundreds of ghost connections on the server per user
See here http://efreedom.com/Question/2-109676/Preventing-Ghost-AFP-Mount-Points
He's running a 10.5 server with 10.6 clients and getting the same thing. So surely this points to the client OS
We've only had this problem after updating the server and clients to 10.6.4.
So my options are rollback all the clients to 10.6.1 which I know previously ran for a year, or wait until and hope 10.6.5 will fix the issue
My guess is that 10.6.5 will be out within the next 90 days as it will integrate with the OS X App Store announced on Wednesday
Great to see another discussion on this issue. I had contributed to "10.6.3 frequent crashes" http://discussions.apple.com/thread.jspa?messageID=12337042#12337042 but I get an error when I link to it (strange) so I discovered this discussion which is my situation since May at my school. Xserve with 200-300 connections usually. Issues started with 10.6.3 (10.6.2 worked fine). Crashes 3 or 4 times a day 1000s of duplicated Ghost connections. After 3 or 4 weeks Coped by turning off timeMachine and not using any programs on the server(as discussed on the previous discusssion). Reasonably stable but one particular period of the school day would consistently cause the issue.
Tried Apple - sympathetic but no assistance. Rolled back to 10.6.2, fresh install of 10.6.2 10.6.4 etc - problem still there today. Using time machine to recover a file crashed afp immediately - (I run time machine manually outside of school time occasionally to back up students docs.)- Using server admin on server crashes afp within an hour or 2.
Today was a new thing. I inserted a DVD to copy it into a share and afp died within 10 secs of insertion.- no one could login!- no shares available. Shows quite clearly on the logs. I also notice that iphoto 09 is incredibly heavy on afp usage.
I had not thought of 10.6.4 clients adding to the problem. I have about 10 that are upgraded from 10.6.2. Maybe that does add to the problem. 50% of our machines are 10.5.8. maybe that doesnt help either.
Just wanted to update everyone, We are still having problems with OSX Server.
However we have now been running SLES 11 with Netatalk for around 50 users as test with the intention to ditch OSX completely at the start of next year. So far we had have 0 major issues, there have been a few but they have all been related to the migration of data from the HFS file system to ext3. But as far as speed and stability goes its a HUGE HUGE HUGE improvement over Apple, altough thats not had because beacuse the Apple server OS just basicly doesnt work, the Client software works just not very well.
We have been running SLES from around 3 weeks now, and we are getting more and more users asking to be put on the server after the positive feedback from the original test group.
Im sure we will be testing 10.6.5 when it eventualy gets released how even if it does fix the issue (and i dont think it will) we will probably still be running with linux.
If anyone wants any help moving away from apple servers and software please let me know and i can provide some help building the linux server.