Want to highlight a helpful answer? Upvote!

Did someone help you, or did an answer or User Tip resolve your issue? Upvote by selecting the upvote arrow. Your feedback helps others! Learn more about when to upvote >

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

afp server issue - very hign cpu load

hallo

i googled an searched this forum al long time but i found no solution.

my problem is that my os x 10.5.4 server with about 30 networked homeddrive users have an issue with the afp server. the afp server process uses all 8 cores of this newest intel xserve with 14 gigs of ram installed. when this happens all users get an spinning wheel. the incoming network traffik is reduced to some kb´s.

ok all users shut down there clients - restart server and about 30 minutes later i have the same problem.

i have dumped the network traffic with wireshark and there i see some tcp retransmissions.
now i need someone who can help me analyse the wireshark protocol, because i cant´s handle that.

so if there is someone out there who can help me plz send me an email to support@premedia.at so that i can send you the wireshark log.

thank you in advice

Macbook Pro, Mac OS X (10.5.4)

Posted on Aug 29, 2008 2:47 AM

Reply
279 replies

Sep 10, 2009 5:29 AM in response to shifty.aimless

Make sure you have the latest firmware and/or drivers for your 3rd-party hardware installed.

At our site, we had similar problems with exploding CPU usage seemingly by AFP which turned out to be caused by a buggy driver for an ATTO SCSI card. After updating the driver to the latest versoin, our system runs w/o problems for more than a month now

Sep 10, 2009 5:32 AM in response to shifty.aimless

Hello from Saudi Arabia,

you are absolutely right and I have never ever had an issue like this in my (> 20 years) career. I like to ask you a question regarding the external Promise:

We too have a Promise RAID (12TB) attached to all our servers via Apple's 4 port 4GB card. I never thought of this, but maybe the trouble is related to a a combination involving an FC Card and a Promise Raid. Could it be we see something stuck on the PCIe bus?

Do you use the internal drives or only the Promise? Have you tried other RAID Levels?

From my side, we are not using AFP at all. The speed and load issue is effecting us on SMB and NFS in a similar way. Fact is, at the time we experience this phenomenon, shortly before the services and the load goes crazy the read and right speed of the internal RAID 5 goes down to less than 1 MB/sec, while I normally see approximately 120MB/sec. The whole thing is ridiculous.

However, ignoring my internal drives, using the FW drives and the Promise via FC Card works fine on the same servers, running the same Leopard OS without issues.

Maybe some other users can share information on FC Card and Promise Raid with us?
We have 32 GB in all our servers, maybe that's a point, too?

Thanks in advance!

Sep 10, 2009 5:56 AM in response to Anabeeb

From my side, we are not using AFP at all. The speed and load issue is effecting us on SMB and NFS in a similar way.


This is interesting. It seems to imply an underlying system-level problem, probably I/O related.

This could be interesting for all the others having this problem: What if you simply switch off AFP and do all your filesharing through SMB? What happens?
Another thing one might want to look at is the CPU usage of
kernel_task (PID 0)

To do so, open a terminal window and execute
top -o+command -s 5

Have a look at the CPU usage values of both AppleFileServer and kernel_task
When AFP's CPU consuption blows off how does kernel_task behave? Does it stay low or does it rise as well, maybe even in advance to the rising of AFP ? The latter case is another indicator of a system-level problem, probably caused by I/O components.

Sep 10, 2009 7:28 PM in response to Larry Dougher

We've been running AFP member servers in an OpenDirectory (OD) & WGM setup with Portable Homes, etc.

It ran fine for a few years, as we upgraded the server and client hardware, server and client OSes, etc, always working pretty well.

Suddenly without warning late last year, the AFP CPU pegged and everything slowed to a crawl. We tried everything on the client and server side that we could think of, even putting in new fileservers to lighten the load, etc. We had consultants and Apple Enterprise Support going, all without any solutions....everyone was stumped.

After days of troubleshooting while the users freaked out, we eventually stumbled on the problem: a flaky WGM machine group (we use machine groups in WGM to control client setups). When we created new groups and moved the machines to them....it all started working perfectly again.

So it ended up being an OpenDirectory/WGM issue, not a fileserver/AFP issue, although it sure didn't look that way.

For anyone in the same type of configuration that is seeing high AFP load, I'd suggest doing a "rebuild" of your directory infrastructure:
* Set all your Replicas to Standalone
* Do an OD backup (or two) of your Master from Server Admin
* Export all Users/Groups/Machines to text files (just to be safe)
* Set your Master to Standalone
* Reboot your Master
* Restore your Master. You might have to use the Terminal to do it ("$ sudo slapconfig -restoredb /path/to/backup") because the GUI restore sometimes doesn't work.
* Recreate the Replicas

That kind of "rebuild" is easy to do and has fixed weird OD issues for me on more than one occasions.

You can also try just recreating your groups to see if that helps.

Sep 11, 2009 4:43 AM in response to JohnDCCIU

glad that resolved your problem... you're one of the lucky ones..

unfortunately that didn't resolve our issue. we have a single standalone server running APF, OD, SMB, with network homes. like you, everything worked fine for years, and then the AFP CPU issue popped up. recently, out of desperation we nuked the server and home folders, reinstalled the OS, completely rebuilt all of the users and groups by hand from scratch, no imports, no ldap restores... then restored only the user's documents into their new home folders to avoid any pref or library issues... no love... within an hour of being back online the server was being stupid again.

Sep 15, 2009 3:45 PM in response to gen_bunty

Well the solution for our setup was related to disabling 'Allow Host Cache Flushing' in the RAID Admin software. Might help someone else to look into that aswell. I'm not taking credit for this, since I did find it mentioned in another thread in this forum regarding CPU-hogging. Thank's for that BTW!

Anyway, our CPU's maxed out after a clean installation this summer to 10.5.8 from 10.4.11 on our four G5 Xserves with Apple Raid 14TB setup. Apparently wasn't that cache setting of any effect under 10.4, but with 10.5 it sure messed things up. I just went with a 10.5 setup with the same basic settings as I had under 10.4 (that had worked quite alright, apart from when users searched the server, and it went into running a halt with catsearch. Also the main reason we now moved to 10.5-server).

Our load was alright up to ~80-90 connected afp roamed user homes (per server), at which point the cpu got maxed up to 100%, not dropping until the connected userbase was down to ~40 again. The I/O was just terrible aswell, no matter what the userbase was in numbers.

Now with that Cache Flushing off, we're back to normal 30–40% load, with 120–140 users.

All clients are tweakes to wan/quantum as mentioned elsewhere. The Cachefolder is always redirected to the local client. The clients are about 300 on 10.4 and 250 on 10.5, mixed PPC/Intel-base. (BTW, as for the homefolder mounting, safekeeping and application distributions we use MacAdministrator, and also ARD for OS updates). The servers are connected with link aggregation at gigabit speed to the backbone, and the clients are on 100-base.

Now I just whish I could redirect more in the Adobe CS3-suite aswell, since most CS-apps won't work redirected. And even getting properly working serverside savesupport from Adobe would be nice indeed, some 20 years later or so.

later,
Jesper

Oct 5, 2009 9:12 AM in response to Adam Gerson

FWIW, I'm running 10.5.8 on my AFP server (the one that had the high-CPU problem previously) and it's working like a champ, so if you're having trouble under 10.5.8 then it may be some kind of configuration issue (there's lots of AFP tuning advice out on the 'net), or if you're in an OpenDirectory environment then it could be something like the issue that I posted earlier that we experienced.

Just to be thorough, it couldn't hurt to do an OD Archive and then an OD Restore (there are a series of steps) to see if that resolves the issue. Or if you have Machine Groups, I'd suggest creating new ones and moving your machine records to them.

Oct 5, 2009 2:14 PM in response to gen_bunty

We upgraded to 10.6.1 and the problem is gone. We previously had 10.5.4 and it would work and then every version thru 10.5.8 would exhibit the problem. I would then roll back to 10.5.4. 10.6.1 has not exhibited the problem in at least week since i installed it. We did a standard upgrade of 10.5.4 to 10.6.1. I did not need to reformat and reenter data.

Oct 6, 2009 1:12 AM in response to netboy-uk

Hi

Apologies, I am a little late in this post.

I can confirm that 10.6 has so far fixed our problem with AFP causing high CPU. I fresh installed 10.6 on two of our home folder servers running the latest Xserve hardware connected using the latest fibre channel cards to Apple RAID's. These servers have now been running for 20+days without a hitch, I am truly amazed in the difference of stability as I really did not have any confidence that it would work considering the issues throughout 10.5.

Bests

Oct 8, 2009 7:05 PM in response to Manfred Rumpl

Our studio hit the AppleFileServer 100% CPU problem and we resolved it after finding the string "HFS resolvelink" in system.log. This was on an XRaid of 2.3TB (6 drives RAID-5 and one hotspare) hosted on a Leopard server for Sharepoints and network homes.

Around January this year I escalated the problem to Apple and was eventually put in touch with Server Support back in the US (I'm in Australia). They suggested that we look out for messages containing "HFS resolvelink" in system.log.

Aug 5 14:33:29 server kernel[0]: HFS resolvelink: can't find iNode85654098
Aug 5 14:58:55 server kernel[0]: HFS resolvelink: can't find iNode85657485
Aug 5 15:02:27 server kernel[0]: HFS resolvelink: can't find iNode85657842
Aug 5 15:27:27 server kernel[0]: HFS resolvelink: can't find iNode85640299

We performed two backups of the volume:
- we installed 3TB RAID-0 into an 8Core MacPro tower, connected via FireWire800 and ran Ditto from Terminal to preserve ACLs and POSIX permissions)
- we saved a .dmg of the entire volume onto a separate Xsan

After we were satisfied we had everything backed up, we then re-formatted the volume, using security options to zero out the entire RAID, then used Ditto to put everything back again - the whole process took about 38 hours.

Some ACLs needed to be re-applied (don't forget to add Spotlight as a user or client machines won't search the mounted sharepoints)

After that we had no further issues - the HFS resolvelink error no longer appeared in the system.log and we haven't been hitting 100% cpu since.

I hope this helps people - post any success back to the forum.

Oct 15, 2009 12:10 PM in response to Manfred Rumpl

I had the same problem with Mac Server 10.5.8 and 70 network users logged at the same time.

I resolved the problem quickly by transferring the network users cache to the local Imac drive. Your server needs to be configured as advanced mode so you can use Work Group Manager. For instructions, go to Page 10"Reducing afp load on your servers" of this PDF http://www.afp548.com/filemgmt_data/files/Leopard%20Server%20Quickstart%20Guide. pdf.

Cons: It takes a bit longer for the user to log in...don't know why!
Pros: Average CPU load passed from 100% to 10%

afp server issue - very hign cpu load

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.