OK, well after putting our best people on this digging through logs and testings things out on a few systems I think we've got it nailed down. I've described the details of what we found below so hopefully it helps someone else out at some point. Our development team has also filed a bug report with Apple. Not sure if this would officially quality as a bug, but it certainly caused us huge headaches and the systems weren't responding as we would have expected them to.
Symptom:
Mobile account users are locked out of their machines when not on the local network containing the Open Directory server. OD logs on impacted machines contain a string of error messages indicating "failed to set offline password for user"
Details of what causes the problem:
Password policy on the Open Directory does not fully agree with the local password policy on the impacted user machine (in this case the policy put in place via profile manager).
What we observed:
We first turned on detailed logging for OD on impacted machines using the following command:
odutil set log debug
This creates extremely verbose logs which can be a bit tedious to work through, but eventually led us to the issue. In summary when a user changed their password (either manually on the preferences panel or when prompted during a login) the following course of events took place every time they logged into or unlocked the computer while on our network (some snapshots from the logs to highlight key steps):
Laptop communicates via LDAPv3 to the OS X server and successfully validates the users login credentials. There are a ton of logs generated during this and subsequent processes, but the high level view of all this activity is that the laptop is communicating with OS X server and both parties seem satisfied that successful credentials have been presented. Sample of logs (slightly redacted):
2014-10-02 09:33:47.279386 EDT - 4965.358324.358327, Node: /LDAPv3/fqdn.company.co, Module: AppleODClientLDAP - valid credentials for xxxxxx@fqdn.company.co
2014-10-02 09:33:47.373012 EDT - 4965.358324.358327, Node: /LDAPv3/fqdn.company.co, Module: AppleODClientLDAP - successful authentication for authzid - 'dn:uid=xxxxxx,cn=users,dc=fqdn,dc=company,dc=co' authcid - 'xxxxxx'
2014-10-02 09:33:47.373073 EDT - 4965.358324.358327, Node: /LDAPv3/fqdn.company.co, Module: AppleODClientLDAP - attached credential to connection - username 'xxxxxx' metaname 'uid=xxxxxx,cn=users,dc=fqdn,dc=company,dc=co' type 'dsRecTypeStandard:Users' authorities '2'
2014-10-02 09:33:47.373258 EDT - 4965.358324.358327, Node: /LDAPv3/fqdn.company.co, Module: AppleODClientLDAP - Audit - success - Verify password for record type Users 'xxxxxx' node '/LDAPv3/fqdn.company.co'
2014-10-02 09:33:47.373475 EDT - 4965.358324.358329 - ODRecordChangePassword request, NodeID: D4DDF601-95CB-4DAF-8B4B-EE054F4DA86C, RecordType: dsRecTypeStandard:Users, Record: xxxxxx, MetaRecordName: uid=xxxxxx,cn=users,dc=fqdn,dc=company,dc=co
2014-10-02 09:33:47.373576 EDT - 4965.358324, Node: /Local/Default - Audit - success - Verify password for record type Users 'xxxxxxx' node '/Local/Default'
2014-10-02 09:33:47.373808 EDT - 4965.358324.358329.358330 - ODQueryCreateWithNode request, NodeID: D4DDF601-95CB-4DAF-8B4B-EE054F4DA86C, RecordType(s): dsRecTypeStandard:Users, Attribute: dsAttrTypeStandard:RecordName, MatchType: EqualTo, Equality: CaseIgnore, Value(s): xxxxxxx, Requested Attributes: dsAttrTypeStandard:AppleMetaRecordName,dsAttrTypeStandard:AuthenticationAuthori ty,dsAttrTypeStandard:PasswordPolicyOptions,dsAttrTypeStandard:Password,dsAttrTy peStandard:GeneratedUID,dsAttrTypeStandard:UniqueID,dsAttrTypeStandard:RecordTyp e,dsAttrTypeNative:_aiv,dsAttrTypeNative:_aivts,dsAttrTypeNative:ShadowHashData, dsAttrTypeStandard:RecordName, Max Results: 1
2014-10-02 09:33:47.374892 EDT - 4965.358332 - RPC: getpwnam, Module: SystemCache, name: xxxxxxx, rpc_version: 2
- Laptop starts the process of logging the user into or unlocking the laptop
- In the background, additional processes keep running to check that the locally cashed credentials (called LocalCachedUser in the Authentication Authority chain on the local machine) are the same as those stored on the server. This is so if the users logs in or attempts to unlock the laptop while off the network (or while the OS X server is otherwise unavailable) the user can still use the laptop without issue.
- The laptop recognizes that the locally cached credentials are different from those in OD for that user and begins the process of updating the locally cached creds.
- As part of this process these credentials are copied down to the laptop and checked against any local password policies. Because the OD password policy and the local password policy did not fully agree there was a possibility that a password could be deemed fine for OD but be rejected for a local only user. However, the behavior in the case of a mobile account is that the new password is accepted (even though it was entered on the laptop directly) but during this update of the cached settings it is rejected for failing the local password policy. This happens silently without any warning to the user or administrator (apart from the cryptic entry in the logs... the more verbose debug logs below provided more details).
2014-10-02 09:33:47.388753 EDT - 4965.358324.358329, Node: /Local/Default, Module: ConfigurationProfiles - CopyPasswordPolicy new policy =
{
allowSimple = 0;
forcePIN = 1;
minComplexChars = 1;
minLength = 7;
requireAlphanumeric = 1;
}
2014-10-02 09:33:47.388758 EDT - 4965.358324.358329, Node: /Local/Default, Module: ConfigurationProfiles - odm_RecordChangePassword has policy
2014-10-02 09:33:47.388789 EDT - 4965.358324.358329, Node: /Local/Default, Module: ConfigurationProfiles - an error of 5402 occurred - Passcode is too simple.
2014-10-02 09:33:47.388849 EDT - 4965.358324.358329, Node: /Local/Default, Module: ConfigurationProfiles - failed to set offline password for user 'xxxxxxxxxx', they will be out of sync
2014-10-02 09:33:47.388857 EDT - 4965.358324.358329, Node: /Local/Default, Module: ConfigurationProfiles - Audit - Password change quality failure (5402) - Modify password for record type Users 'xxxxxxxxxx' node '/Local/Default'
From this point forward the only the OD password will work when the laptop is on the network and only the old cached password will work when not on the network. The user doesn't know this and it just appears that their password is being rejected. In one of these cases the 'old' password wasn't even known to the user since it was a temporary pw setup by our IT team on a new machine they were issued.
The bit that, in our opinion, failed here is that the system should be running these local checks first before attempting to update credentials in OD. i.e., if a users changes the password it should be first checked against any local policies and then checked against remote policies. If the checks are run in that order then the user can't get locked out. Running them in the reverse order and then failing silently when there's a problem leads to this situation where users can get locked out of their machine.
Anyway, I hope that proves useful to someone else at some point.