afp server issue - very hign cpu load

hallo

i googled an searched this forum al long time but i found no solution.

my problem is that my os x 10.5.4 server with about 30 networked homeddrive users have an issue with the afp server. the afp server process uses all 8 cores of this newest intel xserve with 14 gigs of ram installed. when this happens all users get an spinning wheel. the incoming network traffik is reduced to some kb´s.

ok all users shut down there clients - restart server and about 30 minutes later i have the same problem.

i have dumped the network traffic with wireshark and there i see some tcp retransmissions.
now i need someone who can help me analyse the wireshark protocol, because i cant´s handle that.

so if there is someone out there who can help me plz send me an email to support@premedia.at so that i can send you the wireshark log.

thank you in advice

Macbook Pro, Mac OS X (10.5.4)

Posted on Aug 29, 2008 2:47 AM

Reply
279 replies

May 14, 2009 4:37 AM in response to Manfred Rumpl

Dear all,

I have been following this thread for some months now, with quite similar problems to the described in here.

What we have discovered so far is the following:

1) Trash folder matters! We experienced a recurring high CPU usage and spinning pizza of death. We were completely lost until an user told us that when he trashed a file, it took like 5-10 minutes to do so. We inspected the .Trash folder of that user and he had 200k files and 35GB. Fixed now.

2) Mobile Accounts are dangerous in some ways. We had another high cpu scenario on some days and those days a specific user with a MBP was present. We diagnosed his home folder and was using a Desktop folder for "temporarily" storing Movies for his trips... whenever he would came back, a new Movie would be synced. We created a /Personal folder and told him to store everything "personal and fat" there.

3) Take care of your network infrastructure: QoS may be your friend. Limit the Mobile Account's User's network outlets to an inferior QoS than the Network Home Folder's user's. That helps.

4) The Firefox file handling + Network Home Folders combo is crap. We are going to disable Firefox's cache on all machines. We have a 1gbps connection so it won't be a problem for us. The load that it's imposing on the network is farly superior than the web browsing usage of the network.

5) We disabled spotlight but it didn't help very much. We are going to reenable it.

May 15, 2009 7:15 AM in response to lmartinsantos

I think it would be helpful if everyone that is experiencing this problem could reply back with some additional information that may help the group as a whole try and find some patterns.

1. Authentication Source: Open Directory, Active Directory, etc
2. If your directory server is Open Directory was it upgraded from a past version of Mac OS X or was it newly created in 10.5
3. Total users in directory server
4. Does your directory server run on a dedicated server?
5. How do they authenticate to the AFP server Standard or Kerberos
6. On average how many users are connected to each of your AFP servers when the CPU usage increases?
7. Have you moved System/Library/KerberosPlugIns/KerberosAuthData/odpac.bundle as described in earlier posts?
8. Are you using just network homes or just mobile homes or a mix of both


So here are my answers. If anyone has any other questions you think we should ask just add them in.

1. Open Directory
2. Upgraded from 10.3 -> 10.4 -> 10.5
3. 5000
4. Dedicated server Xserve G5
5. We're now using standard but in the past always used kerberos
6. 70
7. Yes
8. Network Homes

May 18, 2009 11:21 AM in response to Jason Tallman

1. OD
2. Clean Install Every summer
3. 2500
4. Dedicated MacPro for OD,
Dedicated Intel XServe for Homdirectories
5. Standard
6. 60
7. No (As it did not seem to fix the problem in other posts)
8. Network Homes


Another question that might need to be asked is what version of 10.5 did you discover the issue, and what version are you on now?

9. 10.5.5
10. 10.5.6

May 20, 2009 4:22 PM in response to Jason Tallman

1. OD
2. new 10.5
3. 250
4. no
5. standard
6. 60 AFP, 50 SMB
7. no
8. both (mostly networked)

Still on 10.5.4. System partition 80/50 free, data partition 780/52 free.

For a while I was under the impression that freeing some space on our data partition above 60GB had help. But this week it did it again.
We reboot the server each monday at 5:00 and before, afp would go crazy each week on thursday or friday (always when the server is busy. I never observed it during the night). After cleaning up the disk, afp was stable for about 4 weeks (still weekly reboot).

Has anybody found out what afp is actually doing with all that cpu? With my dual core the idle load seems to be about 50% above normal. It seems like one CPU ist fully loaded but actually the load is distributed evenly between the two cores.

Periodically free memory goes down to almost 0 and zero-filled inactive pages taking it all. Then the server recovers and we have again 2 GB free (of 4GB). This can happen within half an hour.

Stopping and restarting AFP is not possible, I need to reboot the whole server.

May 23, 2009 1:08 PM in response to Manfred Rumpl

I'm just trying to work out if you are all seeing an issue I had last year that we eventually cracked. Can you all check your AFP Access log's and see if you have a lot of "CatSearch" entries?. What we found was happening was users were doing a 'Find' (apple+F) on the AFP volume on the client, but after the find was completing the server wasnt releasing the memory that was need for each search eventually resulting the the server fully locking up and rebooting. You can see it happening if you leave activity Monitor open on the memory tab you will see your Free Memory slowing vanishing.
The fix I have is for Intel Xserve's only or those who used a UB install disc as this file doesn't exist on PPC systems. This fix is thanks to AppleCare Engineering.

Correct memory leak issue on intel xserves that causes them to hang

1. Make a copy of /etc/rc.server.

$ sudo cp /etc/rc.server /etc/rc.server.bak

2. In the text editor of your choice, make the following change to /etc/rc.server. You will need to edit this file as the root user.

Locate this line in /etc/rc.server:

sysctl -w kern.maxnbuf=90000

Change the line to read:

sysctl -w kern.maxnbuf=20000

Save the changes you have made to /etc/rc.server.

3. Verify the changes were made to /etc/rc.server

4. Reboot the server.


I really hope this solution can help some of you guys as it was an incredibly frustrating issue

May 25, 2009 5:29 AM in response to Stephen Moran1

we had this behaviour on our server 10.5.2, 10-5.3 and 10.5.4 as we were using en external SCSI RAID Level 6 for the shares. I tried it with too RAID with the same result !
after about 1 weeks of good behaviour, CPU becomes glowing again !
since we are working with 3 striped 1.5TB HD's I never more had a problem!
an this with the same server configuration, only changed the RAID for the internal disc's !!
was hard to accept that we cant use the Level6 RAID and are using a level1 instead !!!!

Jun 8, 2009 8:06 AM in response to Stephen Moran1

Stephen, I think that fix might be strictly for Tiger Server and not Leopard Server. Leopard Server added some different adaptations in the rc.server file for different RAM sizes. So people won't find the line with 90000 as the default is now 60000 on Leopard Server.

My first clue was you mentioning a UB disc. All Leopard discs are inherently UB as Leopard is UB throughout. Same media for either PPC or Intel platforms. An installed system can be cloned to either a PPC or Intel and it will work.

Jun 16, 2009 6:57 AM in response to Manfred Rumpl

Just to keep everyone who hasn't given up, up to date - I upgraded server and clients to 10.5.7.... Maybe a bit better over all. Fewer spikes, smaller spikes, more times when things drop back to normal levels. But still days with 60% and 80% CPU. At least there's kill -9 xxxxx to make it usable.
Anyone want to start a pool for Snow Leopard. 🙂

Jun 30, 2009 6:11 AM in response to sammonDJ

sammonDJ wrote:


Spoke to Apple Enterprise Techs yesterday. I am assured the issue is known and the update that will resolve it is to be deployed in the not too distant future.

It looks like the key here is patience.....


And that was Feb. 9
I don't know what they mean with "not too distant future" but 5 months is not what I have in mind when I say something like that. This is getting ridiculous. Not buying Leopard server would've saved us a lot of trouble and money. In fact we've paid for trouble... 😟

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

afp server issue - very hign cpu load

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.