Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Network home folder clients (10.8.2) freezing

Hi all,


I have a Mac OS X Lion Server (10.7.5, all updates, Mac mini server with TB RAID attached) serving network home folders to Mac OS X Mountain Lion 10.8.2 clients.


Some of our users are experiencing freezes that manifest shortly after login. It appears that the shared volume is no longer available, based on the following system log entries. Again, this only happens to some users, but those users have it happen consistently...


Any insight?


Mar 4 13:46:47 BMC-CCT3110-Chloroplast.local KernelEventAgent[47]: tid 00000000 type 'afpfs', mounted on '/Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers', from '//erink@www.ourURL.foo.foo/BMCusers', not responding

Mar 4 13:46:47 BMC-CCT3110-Chloroplast.local KernelEventAgent[47]: tid 00000000 found 1 filesystem(s) with problem(s)

Mar 4 13:46:47 BMC-CCT3110-Chloroplast.local KernelEventAgent[47]: tid 00000000 received event(s) VQ_NOTRESP (1)

Mar 4 13:46:47 --- last message repeated 1 time ---

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: ASP_TCP Disconnect: triggering reconnect by bumping reconnTrigger from curr value 8 on so 0xffffff802d71c370

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: ASP_TCP asp_tcp_usr_control: invalid kernelUseCount 0

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect started /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers prevTrigger 8 currTrigger 9

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: doing reconnect on /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: posting to KEA EINPROGRESS for /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: Max reconnect time: 600 secs, Connect timeout: 15 secs for /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: Logging in with uam 10 /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: Restoring session /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: ASP_TCP ReplayPendingReqs: replaying slot 7 with reqID 51198 afpCmd 0x44 on so 0xffffff802d71c370

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: get the reconnect token

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: ASP_TCP Disconnect: triggering reconnect by bumping reconnTrigger from curr value 9 on so 0xffffff802d71c370

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: ASP_TCP asp_tcp_usr_control: invalid kernelUseCount 0

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect started /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers prevTrigger 9 currTrigger 10

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: doing reconnect on /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast.local KernelEventAgent[47]: tid 00000000 type 'afpfs', mounted on '/Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers', from '//erink@www.ourURL.foo.foo/BMCusers', not responding

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: posting to KEA EINPROGRESS for /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mar 4 13:46:47 BMC-CCT3110-Chloroplast kernel[0]: AFP_VFS afpfs_DoReconnect: Max reconnect time: 600 secs, Connect timeout: 15 secs for /Network/Servers/www.ourURL.foo.foo/Volumes/User_partition/BMCusers

Mac OS X (10.6.7), OS X server

Posted on Mar 4, 2013 12:12 PM

Reply
49 replies

Mar 12, 2013 12:10 PM in response to Nicholas Woolridge

Hello,


I dont know if this will help or not but here are the first things I check when a user can not login.


1. Open workgroup manager, select the user, click Home.


2. Click on the server.local/Users/ NOT the server.com/Users/


3. Set the disk Quota


4. Create Home Now


5. Save


User uploaded file



6. Open server.app and verify the user home folder is set to Custom


User uploaded file


The user should be able to login.


Hope this helps!


Thanks,


ebrind

Mar 12, 2013 1:09 PM in response to ebrind

Hi,


Ebrind, thanks for your suggestions, but I don't think it will help in my case (and my WG manager doesn't have the same entries; I have only one home directory entry).


Perhaps I should have been clearer: the problematic users are able to log in. Things seem OK: they reach their desktop and launch applications. Then things gum up: they get the spinning pizza, and nothing is able to get the computer to respond but a hard reset.


the logs I included seem to indicate that there is an AFP issue where the server disk becomes unresponsive, and the system log is fill of "afpfs" messages about unresponsive volumes, and attempts to reconnect.


This only happens to a small number of the active users, and someone sitting next to a gummed up user in a lab could be happily working away off the same server volume. Those users to whom it happens seem to have it happen repeatedly (so it is somehow likely account related), and some users are untouched.

Apr 21, 2013 5:38 PM in response to cafarom

I can confirm that after 8 months of fighting this problem, this worked for us:

sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.metadata.mds.plist


Cafarom, thank you so much!!


File a bug report with Apple about this. I just did, here is my report, hope it helps you.




Summary:

Some network accounts freeze soon after logging into OSX server. Upon logging in, soon after all the applications are reopened from the previous session, the problem starts (after about 1 minute from login). Applications will begin to hang one-by-one as they try to access the disk. This is fatal and the system must be hard-reset to recover.

This seems to be Spotlight related, disabling spotlight on the client machines seems to "solve" the problem.


Steps to Reproduce:

Not all accounts seem to be affected, I have not yet been able to isolate the trigger. Once a network account becomes "infected", nothing can be done to fix it and the problem persists across all client computers. That is, trying to login with the infected account from any client machine on the network, will always produce the same degenerate behavior.


Once an account becomes 'infected', it can be temporarily cured by backing up the contents, deleting it and recreating it on the server. However the problem soon returns again.


All client machines and the server are running OSX 10.8.3, and are bound to the server. The server is the Open directory master and the DNS server for all machines in the network. No other servers or non-Mac machines are on the network. The server is using a self signed certificate and has the following services running: Calendar, Contacts, DNS, File Sharing, Open Directory, Profile Manager, Websites.


All network accounts use the following Home directory on the server: /Users/


Expected Results:

Upon logging in, applications respond normally, the system and applications do not freeze.


Actual Results:

Some network accounts will always cause the system and all applications to freeze starting after about 1 minute from logging in.


Workaround:

Permanently disable spotlight on all client machines with: sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.metadata.mds.plist


This "fixes" the problem and the affected network accounts will now be able to log in without freezing.


Regression:

Here is the error message from the client machine that accompanies the problem. This message repeats verbatim on every login by an infected account. When applying the workaround this message is not present.


4/21/13 3:52:50.937 PM KernelEventAgent[42]: tid 00000000 received event(s) VQ_NOTRESP (1)

4/21/13 3:52:50.938 PM KernelEventAgent[42]: tid 00000000 type 'afpfs', mounted on '/Network/Servers/XXXXX/Users', from '//user@XXXXX/Users', not responding

4/21/13 3:52:50.000 PM kernel[0]: ASP_TCP Disconnect: triggering reconnect by bumping reconnTrigger from curr value 0 on so 0xffffff8026a6d370

4/21/13 3:52:50.000 PM kernel[0]: ASP_TCP asp_tcp_usr_control: invalid kernelUseCount 0

4/21/13 3:52:50.000 PM kernel[0]: AFP_VFS afpfs_DoReconnect started /Network/Servers/XXXXX/Users prevTrigger 0 currTrigger 1

4/21/13 3:52:50.000 PM kernel[0]: AFP_VFS afpfs_DoReconnect: doing reconnect on /Network/Servers/XXXXX/Users

4/21/13 3:52:50.000 PM kernel[0]: AFP_VFS afpfs_DoReconnect: posting to KEA EINPROGRESS for /Network/Servers/XXXXX/Users

4/21/13 3:52:50.000 PM kernel[0]: AFP_VFS afpfs_DoReconnect: Max reconnect time: 600 secs, Connect timeout: 15 secs for /Network/Servers/XXXXX/Users

4/21/13 3:52:50.000 PM kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Network/Servers/XXXXX/Users

4/21/13 3:52:50.939 PM KernelEventAgent[42]: tid 00000000 found 1 filesystem(s) with problem(s)

4/21/13 3:52:50.000 PM kernel[0]: AFP_VFS afpfs_DoReconnect: Logging in with uam 10 /Network/Servers/XXXXX/Users

4/21/13 3:52:51.000 PM kernel[0]: AFP_VFS afpfs_DoReconnect: Restoring session /Network/Servers/XXXXX/Users

Apr 23, 2013 12:41 PM in response to Nicholas Woolridge

Yes, you can reenable spotlight as long as you rebuild the spotlight index. http://osxdaily.com/2012/01/17/rebuild-spotlight-index/


Make sure in addition to rebuilding the local harddrive index you also rebuild the network home directory index in the same fashion (drag the home directory into spotlight privacy and then remove it).


I believe the issue occurs when the spotlight index gets stuck on a file located on the server. You may find it crops up again. You'll have to repeat this process again.

Apr 24, 2013 1:07 PM in response to cafarom

I can report that disabling spotlight does solve this problem for us, which is great. However, when we re-enable spotlight on a client machine, it immediately returns. This is after disabling spotlight, re-enabling it, and rebuilding the spotlight indices. We performed these steps on the server as well, just in case.


This is, of course, a big problem, since spotlight is pretty core functionality for regular computer use. Users on the "fixed" machines cannot search for files or applications.


So disabling spotlight is a stopgap. I am considering designating some lab machine as "spotlight disabled" so that affected users have machines they can work on.


This really needs to be fixed by Apple.

Apr 25, 2013 7:53 AM in response to Nicholas Woolridge

I did not have any luck with rebuilding the spotlight indexes either. In my case aswell the only way is to leave spotlight perminantly disabled, which as you said - is a huge problem that Apple really needs to fix.


Nicholas, Cafarom, if you haven't done so already, I would highly encourage you to submit your own bug reports here: https://bugreport.apple.com


Apple proritizes their work based on the number of bug reports received so your added information should help them solve the problem faster.

Network home folder clients (10.8.2) freezing

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.