Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Clients randomly losing access to shared volumes

Hello all:

We've started experiencing a problem with our server, and I'm just wondering if someone here might have some ideas.

The Server is 10.4.8; it's more or less up-to-date, except for the last two security updates, which we'll apply as soon as possible (but I don't think our problem is related).

Our 10.4 clients (they're all either 10.4.7 or 10.4.8) are randomly losing access to one or more volumes shared by the server. They'll be logged in, and when they try to save a file to a network share, either get a message that they're no longer connected or they don't have permission to access the share point. Logout/login or a restart puts everything back to normal, but it's becoming a major irritant.

This is a dedicated file server; the only service running on it is AFP, and all files have permissions inherited from their parent folder.

Andrew Penner

14" G3 iBook (700 MHz, 640MB RAM), Mac OS X (10.4.8)

Posted on Dec 29, 2006 10:24 AM

Reply
16 replies

Dec 29, 2006 3:53 PM in response to Andrew Penner

I think that will sound familiar to a lot of us 😟

There was a previous (very long) thread on it here...

http://discussions.apple.com/thread.jspa?threadID=363843&tstart=0&messageID=3684 465#3684465

I suggest starting at the end and working backward. There's been mixed response to the suggestions made.

Can I also suggest that any further discussion on this topic is continued in this here new thread rather than adding to the old one which (I think) has got a bit on the too long side. In fact, if anyone has any summarising notes taken from the previous thread, it would be really useful!

-david

Dec 29, 2006 3:59 PM in response to Andrew Penner

What do the system.log on both the client and server and the AFP log in Server Admin have to say about one of these events?

Isn't there an idle timeout that can be set for AFP also? Maybe they're just being bitten by that?

Roger

Dec 29, 2006 4:38 PM in response to David_x

Some links from previous thread...

Mac OS X Server: AppleShare performance tuning for client and server
(See server tuning part for MaxThreads tweaking)
http://docs.info.apple.com/article.html?artnum=304106

Mac OS X Server: Understanding the "disconnect when idle" feature for AFP connections
(suggested in AFP548 article, next linked)
http://docs.info.apple.com/article.html?artnum=301591

AFP548: AFP. It ain't so bad....
http://www.afp548.com/article.php?story=20060329213629494
(see also followup article linked to at top of that one)

/etc/memberd.conf tweak
Not sure if this is referenced in one of the above but was a strong suggestion in previous thread...

Change to these reduced values...
DefaultExpirationInSecs 360
DefaultFailureExpirationInSecs 180

Misc
(I had this noted - not sure how old it was)
Spanning Tree Protocol on Cisco switches have been known to disrupt AFP. So, it may not actually be AFP on the server. Do some diagnostics on your network

-david


Server 10.4.8

Jan 2, 2007 9:39 AM in response to Community User

Hi Roger, thanks for getting back to me. I haven't had a chance yet to peruse the links David provided, but here's what I can give you in response to your question.

AFP Access:
We use secured access to the server, and we have selected the option to allow clients to sleep 24 hours without showing as idle.

AFP log sample for two users:
IP 192.168.1.143 - - [02/Jan/2007:09:57:33 -0700] "Login GPL-CS-008" 0 0 0
IP 192.168.1.143 - - [02/Jan/2007:10:01:33 -0700] "Session Network Error Disconnect: " 0 0 0
IP 192.168.1.143 - - [02/Jan/2007:10:01:33 -0700] "Saved for Reconnect User: GPL-CS-008" 1165605234 3168 0
IP 192.168.1.117 - - [02/Jan/2007:10:05:39 -0700] "Login GPL-CS-001" 0 0 0
IP 192.168.1.117 - - [02/Jan/2007:10:07:29 -0700] "Logout GPL-CS-001" 0 0 0
IP 192.168.1.117 - - [02/Jan/2007:10:11:43 -0700] "Session Network Error Disconnect: " 0 0 0
IP 192.168.1.117 - - [02/Jan/2007:10:11:43 -0700] "Saved for Reconnect User: GPL-CS-001" 1165605234 3000 0

The system log for GPL-CS-001 showed the same three share points being mounted and dismounted (they should have 10 mounted share points). I can get the log from that computer a little later.

Andrew Penner

Jan 2, 2007 4:41 PM in response to Andrew Penner

Unfortunately, the only thing that caught my eye from that log snippet is the "Saved for Reconnect" message's first integer field. It looks like a time to me, so I checked it:

amtime1970 -d 1165605234
Fri Dec 8 14:13:54 2006

Does December 8 have any significance, like the last server reboot?

Roger

Jan 3, 2007 4:15 AM in response to Andrew Penner

Any idea on the reasoning behind reducing the
DefaultExpirationInSecs and
DefaultFailureExpirationInSecs values?


Can't remember offhand, I'm think it was discussed in the previous forum thread.

This line from your log ...

IP 192.168.1.117 - - [02/Jan/2007:10:11:43 -0700] "Session Network Error Disconnect: " 0 0 0

...is similar to ones I get although mine seem to be related to 5 minute earlier "sleep request" log messages. The other log messages seem entirely consistent with 'normal' logs.

Roger, interesting observation on the integer field - I'd never really noticed it before. I had a look at my own logs and they all have similar field. My date converted to October (using "date -r" command) which certainly is earlier than last reboot. Maybe last service update?

-david

Server 10.4.8

Jan 3, 2007 3:53 PM in response to David_x

Thank you for the -r switch. I hadn't read OSX's date man page and I didn't know about that.

The reboot is only an idea off the top of my head. Why does that field, which looks like a timestamp, show a month ago? Except that two clients had the exact same timestamp, it could be client related.

As far as a service update, ls -lrt /Library/Reciepts should answer that.

TBH, after all that's gone on in those other posts, I'm just following anything that might be interesting. Nobody's got it yet, so putzing around stands as good a chance as anything.

Roger

Jan 4, 2007 1:11 PM in response to David_x

Hi David:

Well, at this point, I'm cautiously optimistic. It's been two days since I followed the tip about reducing the DefaultExpirationInSecs and the DefaultFailureExpirationInSecs values, and no one has reported getting punted from their access to a sharepoint.

I'm going to leave this open for a few more days to see how stable this is.

Roger: the only significance to the Dec. 8 date that I can think of is that it might be the day we tried using ACLs to resolve a different access issue. (This was an experiment that went horribly wrong in our current environment - I'm not saying that ACLs don't work, but trying to add them to a system designed have files inherit their permissions from their parent folder didn't work for us. I suspect ACLs would work great if you designed the system to use them from the start.)

Andrew

Jan 4, 2007 1:32 PM in response to Community User

had a look at /Library/Receipts and compared it to the last two different datestamps (they remain consistently same for weeks/months). The second last corresponds (within couple of minutes) of a set of updates (including QT which needs a restart) but the latest datestamp does not correspond with anything in Receipts. Doesn't mean there was NOT a restart, or maybe just an afp service restart? I'll keep an eye on it...

-david

PS. Andrew, if OK with you, just leave thread open - I'm sure others will be looking for same type of discussion 😟

Jan 4, 2007 4:33 PM in response to Andrew Penner

FWIW, I would expect that turning on ACLs would require a kernel extension to be loaded to support them. If they're turned off, is that kernel extension unloaded properly? Would it remain in the kext.cache to persist across reboots?

I wonder if kextstat might shed some light?

Roger

Jan 5, 2007 4:40 PM in response to Andrew Penner

Does that mean that if you do:

uptime

in a shell that the uptime may trace back to: Fri Dec 8 14:13:54 2006?

BTW, what you were describing is the effect of ACLs on the file system, not the possibility of a kernel module being added to do the ACLs. The kext idea would really only be valid if this behaviour started after playing around with the ACLs. BTW, when did this start?

Roger

Jan 7, 2007 8:13 PM in response to Andrew Penner

From reading your symptoms, I would suggest checking the following:

1. Check your network. You could have a malfunctioning switch or cable between the client and server. I've actually tracked AFP disconnects down to one miswired Ethernet cable on a small network.

2. Check the filesystem integrity on the server volumes that house your share points. Use Disk Utility's Repair Volume button (or diskutil repairVolume /Volumes/mount-point). You'll have to stop all file services that access that volume.

--Gerrit

Clients randomly losing access to shared volumes

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.