Kerberos Conundrum
Greetings all,
First off: we're running OS X Server 10.8.5 providing OD, AFP, DNS, DHCP and a few other services to about 50 or so 10.8.5 and 10.9.x clients all bound to the server.
This issue I’d like help with began as I attempted to address an issue that was causing unrecoverable hangs on network accounts.
Users logged in to their network accounts have been reporting a lot of application freezes (rainbow wheel) that often escalate to a full unrecoverable lock up of their session, requiring forced shut down of the client computer using the power button. Multiple sessions digging into client and server logs have led me to believe that all of these failures are traceable to failures involving reading/writing to various sqlite databases. This then leads to various applications and daemons freezing up, including Firefox, Safari, Mail, Contacts and tccd. A friend of mine who's been at this much longer than I suggested that a likely cause was an authentication failure in the Kerberos stack. This appeared to pan out, as I was seeing a lot of errors in the server's kerberos log that looked like this:
2016-09-09T10:15:41 AS-REQ davidi@MYSERVER.MYDOMAIN.NET from 10.23.20.1:50969 for krbtgt/MYSERVER.MYDOMAIN.NET@MYSERVER.MYDOMAIN.NET
2016-09-09T10:15:41 UNKNOWN -- davidi@MYSERVER.MYDOMAIN.NET: no such entry found in hdb
And these:
016-09-09T10:16:11 TGS-REQ erde-suns-macbo$@MYSERVER.MYDOMAIN.NET from 10.242.4.10:56266 for host/erde-suns-macbook-pro.local@MYSERVER.MYDOMAIN.NET
2016-09-09T10:16:11 Server not found in database: host/erde-suns-macbook-pro.local@MYSERVER.MYDOMAIN.NET: no such entry found in hdb
2016-09-09T10:16:11 Failed building TGS-REP to 10.242.4.10:56266
2016-09-09T10:16:11 tgs-req: sending error: -1765328377 to client
2016-09-09T10:16:11 TGS-REQ erde-suns-macbo$@MYSERVER.MYDOMAIN.NET from 10.242.4.10:55163 for ldap/myserver.mydomain.net@MYSERVER.MYDOMAIN.NET [canonicalize]
2016-09-09T10:16:11 TGS-REQ authtime: 2016-09-09T10:16:11 starttime: 2016-09-09T10:16:11 endtime: 2016-09-09T20:16:11 renew till: unset
2016-09-09T10:16:45 TGS-REQ leahm@MYSERVER.MYDOMAIN.NET from 10.23.20.12:54625 for vnc/PLC-HMI.local@MYSERVER.MYDOMAIN.NET [canonicalize, forwardable]
2016-09-09T10:16:45 Searching referral for PLC-HMI.local
2016-09-09T10:16:45 Returning a referral to realm LOCAL for server vnc/PLC-HMI.local@MYSERVER.MYDOMAIN.NET that was not found
2016-09-09T10:16:45 Server not found in database: krbtgt/LOCAL@MYSERVER.MYDOMAIN.NET: no such entry found in hdb
2016-09-09T10:16:45 Failed building TGS-REP to 10.23.20.12:54625
2016-09-09T10:16:45 tgs-req: sending error: -1765328377 to client
I found one thing in our server’s AFP conf that looked awry:
afp: kerberosPrincipal = "afpserver/LKDC:SHA1.AEB9DE7C32B710BB5552569A00257A54B0DF9F58@LKDC:SHA1.AEB9DE7 C32B710BB5552569A00257A54B0DF9F58"
According to several articles like this one: https://discussions.apple.com/thread/6037923?tstart=0
That entry really ought to look like this:
afp:kerberosPrincipal = "afpserver/myserver.mydomain.net@MYSERVER.MYDOMAIN.NET"
I went ahead and changed this using serveradmin and rebooted.
The Kerberos log was then populated with a lot of messages like this one:
2016-09-13T09:47:45 Got a canonicalize request for a LKDC realm from local-ipc
2016-09-13T09:47:45 Asked for LKDC, but there is none
So I then ran:
sudo -rf /var/db/krb5kdc
sudo /usr/libexec/configureLocalKDC
And rebooted.
This broke the bind for all the client machines, but those got reestablished once I rebooted those machines that had been up when I made the change.
However, now things in the Kerberos log are a total mess.
I’m now getting TONS of these:
2016-09-16T13:37:28 AS-REQ davidr@MYSERVER.MYDOMAIN.NET from 10.23.5.3:52077 for krbtgt/MYSERVER.MYDOMAIN.NET@MYSERVER.MYDOMAIN.NET
2016-09-16T13:37:28 UNKNOWN -- davidr@MYSERVER.MYDOMAIN.NET: no such entry found in hdb
And also seeing a bunch of forwards that look like this:
Got a canonicalize request for a LKDC realm from local-ipc
2016-09-16T13:37:19 LKDC referral to the real LKDC realm name
2016-09-16T13:37:19 AS-REQ teaa@LKDC:SHA1.AEB9DE7C32B710BB5552569A00257A54B0DF9F58 from local-ipc for krbtgt/LKDC:SHA1.AEB9DE7C32B710BB5552569A00257A54B0DF9F58@LKDC:SHA1.AEB9DE7C32B 710BB5552569A00257A54B0DF9F58
I am tempted to just restore /var/db/krb5kdc from a TM backup and reboot, but that won’t necessarily address the fundamental issue here.
I’d like to ensure that the hdb contains all user, group and computer records and that all krbtgt requests are being referred to the proper db in the proper realm. At least I think that’s what’s gone awry here.
Ready to be thrashed by my betters… all suggestions welcome.
Thanks much,
Paul
MAC MINI SERVER (LATE 2012), OS X Server, 10.8.5