Newsroom Update

Beginning in May, a special Today at Apple series titled “Made for Business” will offer small business owners and entrepreneurs free opportunities to learn how Apple products and services can support their growth and success. Learn more >

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

NFS bug in 10.6.8, PCP Server. xgrid clients, NFS timeouts/disconnects, jobs fail.

Nasty NFS bug in 10.6.8, running a PCP Server. xgrid clients, have NFS timeouts/disconnects, and jobs fail. Anyone else experiencing this issue?


Jul 29 17:01:34 servername org.machx.snmp-data[46790]: Time to sleep 60 seconds

Jul 29 17:02:34 servername org.machx.snmp-data[46790]: done

Jul 29 17:02:41 servername KernelEventAgent[48]: tid 00000000 received event(s) VQ_NOTRESP (1)

Jul 29 17:02:41 servername KernelEventAgent[48]: tid 00000000 type 'nfs', mounted on '/Network/Servers/someserver.some.edu/Volumes/PCP2_Media/Podcast_Producer_Libra ry', from 'someserver.some.edu:/Volumes/PCP2_Media/Podcast_Producer_Library', not responding

Jul 29 17:02:41 servername KernelEventAgent[48]: tid 00000000 found 1 filesystem(s) with problem(s)

Jul 29 17:02:58 servername KernelEventAgent[48]: tid 00000000 unmounting 1 filesystems

Jul 29 17:03:08 servername pcastaction[46809]: PodcastProducer::Actions::QTImport: FINISH


* Mac Pro running 10.6.8v2 with two Mac Pro xgrid clients.

* When the clients access the NFS share, the server stops responding, All TCP connections die, and after a few minutes/seconds it responds again. Jobs fail.

* Not surprisingly, NFS disconnect has caused clients to Kernel Panic.

* I reverted to 10.6.7 and the issue subsided.


is anyone else having PCP/NFS/Xgrid issues with 10.6.8 server ?


Any insigt would be greatly appreciated.

Posted on Jul 30, 2011 5:18 AM

Reply
Question marked as Best reply

Posted on Nov 14, 2011 1:12 PM

Just encountered this bug with 10.6.8 - seems no help is out there, going to revert to 10.6.7






Nov 14 14:27:22 archive KernelEventAgent[66]: tid 00000000 type 'nfs', mounted on '/Volumes/isilon', from '192.168.10.24:/ifs/data/', not responding
Nov 14 14:27:22 archive KernelEventAgent[66]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 14 14:29:14 archive KernelEventAgent[66]: tid 00000000 received event(s) VQ_NOTRESP (1)


Matthew Ziegele

2 replies
Question marked as Best reply

Nov 14, 2011 1:12 PM in response to matuzalem

Just encountered this bug with 10.6.8 - seems no help is out there, going to revert to 10.6.7






Nov 14 14:27:22 archive KernelEventAgent[66]: tid 00000000 type 'nfs', mounted on '/Volumes/isilon', from '192.168.10.24:/ifs/data/', not responding
Nov 14 14:27:22 archive KernelEventAgent[66]: tid 00000000 found 1 filesystem(s) with problem(s)
Nov 14 14:29:14 archive KernelEventAgent[66]: tid 00000000 received event(s) VQ_NOTRESP (1)


Matthew Ziegele

Nov 17, 2011 9:49 AM in response to Matthew Ziegele

Just a thought, and it may not apply here... but - I've seen timeouts with Macs and NFS servers (I have experience with Isilon) when a Mac attempts to delete a large file. The Mac times out waiting for confirmation from the server that the file has been deleted. There are ways to change the NFS timeouts on the Mac to make it less likely it'll hit a timeout. Or, the Isilon can be upgraded to OneFS 6.5 or later, which can return the delete confirmation much faster to the client.


The default timeouts on the Mac for a unresponsive NFS mount are controlled via a couple of options in /etc/nfs.conf. From the nfs.conf man page...


nfs.client.initialdowndelay

When an NFS server is not responding, this option specifies how

long to wait (in seconds) before the initial notification is

posted. The default is 12 seconds.


nfs.client.nextdowndelay

When an NFS server is not responding, this option specifies how

long to wait (in seconds) between notifications. The default is

30 seconds.


So basically, if you change the initialdowndelay value (increase it from 12), it's possible you'll end up avoiding the timeout message. Alternatively, upgrading OneFS may be a better option as it won't require going around to all of your Macs, making these config changes.

NFS bug in 10.6.8, PCP Server. xgrid clients, NFS timeouts/disconnects, jobs fail.

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.