Q: ACLs & 10.7.3
So i stupidly made the mistake of upgrading my relatively small xsan to 10.7.3. Previously it was sitting at 10.7.2 and seemed fairly stable. (MDCs were 10.7.2 but the clients were 10.5.8) We recently purchased a new J-class expansion array that we grew our volume onto. At that time, a decision was made to bring all the clients up to Lion and FCS3. We performed the grow and then upgraded all of the clients. Shortly afterwords, i realized the clients were all sitting at 10.7.3. (Software Update pulled down the latest greatest) With xsan best practices in mind I decided to go ahead and update my MDCs as well to keep them at the same level as the highest client. (Version wise)
Big mistake.
The Xsan volume did not like this. After the upgrade, my volume stopped accepting ACLs & AD permissions. As of right now, ACLs are completely worthless and broken. If i put any kind of ACL on the volume, none of my clients logged in with their domain accounts are able to write files to it. They can read the volume. They just can't write to it. If they try to edit/delete/create files or folders they get an "Error code -43" message. If i remove ACLs (clicking the minus sign on all ACLs under "Set Permissions") and fall back to Posix permissions alone ("everyone" set to read/write), the clients work just fine. They can read/write/edit/delete/change files all day long. But with ACLs on the volume, its worthless.
All of my machines are bound to AD. They all seem perfectly fine. I can login with domain accounts and don't have any problems. The machines all have green lights next to the domain in the system preference pane. I can set permissions on local folders and everything seems great.
I've tried practically everything i can think of. I've removed clients from the san and re-added. I've unbound machines from AD and rebound. I've unbound both MDCs and rebound. I wiped one of the clients and did a fresh Lion install from scratch. I've added a single AD account (instead of our normal security group) to the ACLs section and still no luck. I still get that stupid -43 error. I've even turned ACLs in Volume Settings off and then back on. No luck. Today i tried rolling back one of the clients to 10.7.2 (leaving the MDCs @ 10.7.3) and still had the same issue.
Looking through the logs, i'm not seeing anything suspicious. Though i will say that i'm fairly new to Mac logs. The only thing i see that might be of any concern is a few annoying Spotlight errors. (Even though spotlight search is unchecked in volume settings)
At this point the only idea i have left is to rollback both my MDCs to 10.7.2. I'm planning to try that tomorrow evening to see if i have any luck. If anyone has any advice, i would greatly appreciate it. Lion (at least where xsan is concerned) still seems to be in beta and i seem to be one of only a few beta testers.
Xsan 2.3, Mac OS X (10.7.3)
Posted on Feb 15, 2012 7:58 PM
I just got off the phone with Apple's Xsan tech support. We got it fixed. There's a small bug that caused this so i'm going to document it here for future reference or for anyone else on 10.7.3 what is having the same problem.
The problem turned out to be the posix group that owned the volume. Our volume was owned by a unix group that only existed on the MDCs and not on the clients. (We had changed it to something other than wheel/admin a while back for various reasons) For some reason the clients were getting hung up and erroring because they had no idea what or who this group was. As soon as i set it back to admin/wheel everything started working again. (ACLs and all) The error code -43 went away.
The apple engineer said the posix group owner needs to be set to something that the clients AND MDCs can recognize. He said i could either go around and create this group on all the clients, set it back to admin/wheel or, even better, just set it to an Active Directory group which all the clients and MDCs recognized. (Which is what i did.)
So there you go. He did confirm that this is a bug with 10.7.3. He said by design the clients aren't really supposed to care about posix permissions if ACLs exist. He said if Xsan encounters a posix group and/or owner id that it doesn't recognize its not supposed to care. But for some reason 10.7.3 is caring and erroring with error code -43. He said a future release should fix this problem.
Posted on Feb 16, 2012 9:14 PM