rpc.lockd CPU usage over the top

Hello everyone,

I just bought a new Mac Mini Server (2,6GHz, 4GB memory, with Mac OS X Snow Leopard Server and 10.6.5)

It takes the role of network services and Open Directory master.
The clients are all linux systems and have access to their home directories via NFS.
At first all worked very well, but then users started to get freezing programs, which were trying to access the home directory.
The client log states the following when this happens:
Nov 30 10:06:29 clientName kernel: 7913.576030 lockd: server macminiserver not responding, still trying

This makes the client unusable!

On the server, you can see that the process rpc.lockd takes all CPU power. sometimes 1 core, sometimes both cores. only killing the process and/or restart helps. It is logical that if the process on the server hangs, all clients freeze too. But how can I avoid this.All clients, need to relogin too. This is very frustrating, and nobody can work uninterrupted.

I saw people having similiar problems in this thread, but it is archieved now.
http://discussions.apple.com/thread.jspa?messageID=11089358

Is there a solution or a workaround?
Do I need this looking process, or can I somehow disable locking on the clients side?

Thank you.

Benjamin

Message was edited by: ben_k42

Mac Mini Server, Mac OS X (10.6.5), Server Edition

Posted on Dec 1, 2010 2:10 AM

Reply
5 replies

Dec 1, 2010 10:06 AM in response to raleighr3

hi raleigh,

thank you for your suggestion. I turned verbose mode on and restarted.
Then I started to start my clients. This is what I got in /var/log/system.log

Dec 1 18:49:22 server rpc.statd[72]: Failed to contact host clientName: RPC: Port mapper failure - RPC: Timed out
Dec 1 18:52:56 server rpc.lockd[138]: *** process 138 exceeded 500 log message per second limit - remaining messages this second discarded ***
Dec 1 18:54:01: --- last message repeated 15 times ---
Dec 1 18:53:22 server rpc.lockd[138]: *** process 138 exceeded 500 log message per second limit - remaining messages this second discarded ***



This looks kind of strange to me.
I checked for the RPC ports:
On my client I executed: rpcinfo -p
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper

And I checked the open ports on the server with: nmap server
111/tcp open rpcbind

so the ports should be ok, I guess.

how can I make him tell me those messages?

Thank you

Ben


PS: a few additional hints.
I ran: fs_usage -w rpc.lockd

and it called select() over and over and over, many many times per second.
Another thing which catched my eye, was when I inspected the process in the Activity Monitor.
It had over 2 billion unix system calls and added probably a few millions every few seconds at least.

Also, I sampled the rpc.lockd process as it was running with max cpu, maybe someone can read something out of there:
2157 select$DARWIN_EXTSN
This is running over and over again.


Analysis of sampling rpc.lockd (pid 9985) every 1 millisecond
Call graph:
2633 Thread_52623 DispatchQueue_1: com.apple.main-thread (serial)
2633 0x13776330c
2633 0x137767ea8
2161 0x137766fd2
2157 select$DARWIN_EXTSN
4 0x137766fd2
384 0x137766f9a
383 getdtablesize
1 0x137766f9a
32 0x13776700f
32 gettimeofday
25 __gettimeofday
14 __nanotime
11 __gettimeofday
7 gettimeofday
25 0x137766fac
21 __memcpy
2 0x137766fac
1 __bcopy
1 bcopy
10 0x137767059
7 0x13776931f
3 0x1377680fb
2 0x13776812d
1 0x137768115
1 0x137768165
1 0x137769309
1 0x13776930d
1 0x137769394
5 0x137767029
3 svc_getreqset
2 0x137767029
4 dyld_stub_getdtablesize
3 dyld_stub_svc_getreqset
2 0x137766f9c
2 0x137766fd4
2 dyld_stub_bcopy
1 0x137766fa4
1 0x137766fc6
1 dyld_stub_select$DARWIN_EXTSN

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
select$DARWIN_EXTSN 2157
getdtablesize 383
__memcpy 21
__nanotime 14
__gettimeofday 11
gettimeofday 7

Dec 1, 2010 10:14 AM in response to ben_k42

this is the output of when the lockd hasn't hanged itself. When it hangs, it runs the "select S="-part over and over.


19:13:02.352 socketpair 0.000013 rpc.lockd
19:13:02.352 sendto F=10 B=0x45 0.000011 rpc.lockd
19:13:02.352 sendmsg F=10 B=0x1 0.000005 rpc.lockd
19:13:02.352 close F=34 0.000001 rpc.lockd
19:13:02.352 recvfrom F=33 B=0x4 0.000095 W rpc.lockd
19:13:02.352 close F=33 0.000004 rpc.lockd
19:13:02.352 recvfrom F=10 B=0x1c 0.000002 rpc.lockd
19:13:02.352 recvfrom F=10 B=0x45 0.000002 rpc.lockd
19:13:02.352 select S=0 0.000004 rpc.lockd
19:13:02.352 sendto F=10 B=0x1c 0.000004 rpc.lockd
19:13:02.352 close F=32 0.000005 rpc.lockd
19:13:02.365 write F=29 B=0x28 0.000023 rpc.lockd
19:13:07.398 select S=1

Dec 1, 2010 12:22 PM in response to ben_k42

You can adjust the messages per second limit of syslogd by editing /etc/asl.conf
Do a man asl.conf to see the parameters you can set and their format.
I think you'll need something like this:
= mps_limit 0
This will disable that check altogether.

I don't really see much in the logs that is suspicious to me.

Are any of the clients trying to access the same file(s) simultaneously? If not and your nfs client supports it you might can try setting setting the locallocks mount option to true as a workaround.

Are you using specifying your exports by name or IP? Try using IP if you're name now. What other mount options are you specifying on the client side?

Check out nfsstat and see if that has any helpful data.

-raleigh

Dec 2, 2010 2:45 AM in response to raleighr3

clients are not accessing files simultaniously. Only the /Users folder is mounted and everyone only accesses their own homefolder. (btw. how can I set the share in a way that userA can only access /Users/userA. at the moment everyone can read and write everyones homefolder)

I'll add the mps_limit parameter and get back to you.

I got a hint, that I should upgrade to NFSv4. Is this supported by the mac server? How can I enable this?

thank you.
ben

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

rpc.lockd CPU usage over the top

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.