Kerberos/LDAP failing, guru needed
Hello nutcrackers, I have a serious problem in Yosemite server.
Some background info, I'll try to be as short as possible for readability:
We are serving some 40 users through an Xserve, mostly network accounts and some mobile accounts. It has been running OS X Server (10.6.x) ever since we got it, but never worked 100% stable. Because of this and other reasons we decided to upgrade to Yosemite via Mountain lion, which was done about a month ago. At first I tried to install ML over the old OS, but migration failed miserably with logon failures, slow logon speeds etc. With little time to troubleshoot, I wiped the system drive, did a fresh install of ML and tried to restore the OD master from an archive done in 10.6, did not work (failed to import). Booted into the old 10.6 system which I had backed up to an external drive and exported the users and groups from Workgroup manager, this did work. During the week things did not look great, users experienced lockups/freezing more than ever, and combined with a bug in ML server where disconnecting users through the Server app would not destroy the session thus making them unable to log in again, we decided to move on to Yosemite. This time the upgrade seemed to work fine, users could log in and we have experienced very few frozen sessions. However, something seems to have gone severely wrong in the migration of the LDAP and/or Kerberos data. The first indication was that the syslog was littered with:
Jul 2 15:54:39 ourserver.our-domain.top kdc[260]: AS-REQ network_user@OURSERVER.OUR-DOMAIN.TOP from 192.168.100.241:59543 for krbtgt/OURSERVER.OUR-DOMAIN.TOP@OURSERVER.OUR-DOMAIN.TOP
Jul 2 15:54:39 ourserver.our-domain.top sandboxd[283] ([260]): kdc(260) deny file-read-data /private/etc/krb5.conf
Jul 2 15:54:39 ourserver.our-domain.top kdc[260]: UNKNOWN -- network_user@OURSERVER.OUR-DOMAIN.TOP: no such entry found in hdb
The second string is expected, or at least I get the same log entries on a fresh test server set up on a different machine. But the UNKNOWN entry is alarming.
So I started mucking about with Kerberos, on the server:
klist: krb5_cc_get_principal: No credentials cache file found
kinit: krb5_get_init_creds: Client (local_admin@OURSERVER.OUR-DOMAIN.TOP) unknown
kinit: krb5_get_init_creds: Client (network_user@OURSERVER.OUR-DOMAIN.TOP) unknown
kadmind and kdc are running on the server.
I cheched all the Kerberos files and realized that kdc.conf was missing from /private/var/db/krb5kdc. Tried to create it manually by copying the contents from kdc.conf on my test server, substituting all the host and principal names, rebooted but no improvement. Also, because the conf file was missing, no logging directory was set up so I cant easily isolate all the Kerberos entries, I have to look at the syslog and guess what belongs where. The acl_file.[PRINCIPAL] and the m_key.[PRINCIPAL] did exist in this directory. From the Open Directory Configuration log I could (probably) confirm this, no notion about creating or copying the file but at least about not being able to open it:
2015-05-29 21:49:04 +0000 Updating kdc.conf
2015-05-29 21:49:04 +0000 Error opening kdc.conf for writing
2015-05-29 21:49:04 +0000 Copied file from /Volumes/Server HD/Recovered Items/private/var/db/krb5kdc/acl_file.OURSERVER.OUR-DOMAIN.TOP to /Volumes/Server HD/private/var/db/krb5kdc/acl_file.OURSERVER.OUR-DOMAIN.TOP.
2015-05-29 21:49:04 +0000 Copied file from /Volumes/Server HD/Recovered Items/private/var/db/krb5kdc/m_key.OURSERVER.OUR-DOMAIN.TOP to /Volumes/Server HD/private/var/db/krb5kdc/m_key.OURSERVER.OUR-DOMAIN.TOP.
Googled some but did not really catch anything, tried the usual tests and remedies, DNS config is fine, tried to rekerberize but no improvement.
If I've understood everything correctly OS X does not use a separate Kerberos database, but rather lets Kerberos search the LDAP database, as can be confirmed by the kdc.conf entry(?):
dbname = od:/LDAPv3/ldapi://%2Fvar%2Frun%2Fldapi
So I kept on googling, trying to get my head around how to interact with LDAP, found the ldapsearch command and the results are weird. On my test server, I can successfully run
ldapsearch -W -D uid=diradmin,cn=users,dc=testserver,dc=our-domain,dc=top
and get some results, but on the Xserve the same command (with dc=ourserver...) i get:
ldap_bind: Insufficient access (50)
If I switch the uid to something random I get:
ldap_bind: Invalid credentials (49)
so the server acknowledges the username, but it seems that my diradmin simply is not diradmin. Well sort of, because I can still authenticate in WGM and Directory utility. Just for good measure I tried to reset the Open Directory administrator password but nothing changed.
So does anyone have any clue where I should continue looking? Given how much trouble I've run into so far, I'm reasonably reluctant about for instance archiving, destroying and restoring the OD Master. And I just hate situations where things can't be repaired in any other way than starting from scratch, because I'll never understand where it went wrong in the first place.
Thanks/Alexander
Xserve