Kerberos stopped - giant kdc.log
Recently, after losing the ability to authenticate against the domain, I found Server Admin indicating that Kerberos was stopped. The DNS is on that same server, and was the first place I looked. I noted that the reverse lookup for the server's IP was pointing to the name server's host name, so I changed it to the primary host name of the OD server.
I'm not sure how the PTR record was altered, as very few people have the needed access. However, once changed and the server rebooted, user authentication AND single sign-on were working again. Oddly, Server Admin still reports that Kerberos is stopped.
Days passed and I've already ordered a couple of Mac Mini Servers to replace our aging Xserve G4 OD system. Even so, I still want to know what happened and how to fix it without resorting to the archive-demote-promote-restore routine. While I was copying one of the OD server's backup drives onto a testbed Xserve, I noticed that one file was taking a very long time to copy: /var/log/krb5kdc/kdc.log
So, I'm thinking 3.08GB is a bit much for a log file. While Googling, I found the following synopsis of the krb5.conf file:
http://docs.sun.com/app/docs/doc/816-0219/6m6njqb94?a=view
Of specific interest is this:
"kdc_rotate:
A relation subsection that enables kdc logging to be rotated to multiple files based on a time interval. This can be used to avoid logging to one file, which may grow too large and bring the KDC to a halt."
Being uncertain that kdc.log would simply re-generate, I initially chose to setup krb5.conf with the appropriate settings. Lesson 1: /etc/krb5/krb5.conf does not exist in Leopard Server. Look for /Library/Preferences/edu.mit.Kerberos - Ignoring the warning about how the file is automatically generated, I added the period and versions parameters and rebooted. Nada. Oh, yeah, forgot about the periodic routines...
sudo periodic daily weekly monthly - no joy; the 3GB kdc.log file was still there.
I then noticed that the file /etc/periodic/daily/601.daily.server.krb5kdc did not have the executable bit set (same as on the other 10.5 servers I checked). I set the 601 script as executable and tried the daily again. This time, the kdc.log file was renamed kdc.1.log. After rebooting, a new kdc.log file didn't appear. This could have been due to my messing with the permissions on the krb5kdc log directory. Or, could this be why the 601 daily script is set as no-execute? It doesn't work?
At this point, I restored to the backup and began again. Before booting to the restored drive, I removed the 3GB log file and crossed my fingers. This time the kdc.log file was created. But, for every step forward... Now LDAP is stopped and slapd keeps crashing with this log entry repeating:
------------------------
Nov 6 18:12:16 od1 slapd[573]: @(#) $OpenLDAP: slapd 2.3.27 (Jan 26 2009 09:49:21) $
Nov 6 18:12:17 od1 slapd[573]: overlay_config(): warning, overlay "dynid" already in list
Nov 6 18:12:17: --- last message repeated 4 times ---
Nov 6 18:12:17 od1 slapd[573]: bdb dbopen: unclean shutdown detected; attempting recovery.
Nov 6 18:12:17 od1 slapd[573]: bdb(dc= od1,dc=domain,dc=lan): Ignoring log file: /var/db/openldap/openldap-data/log.0000000470: magic number 0, not 40988
Nov 6 18:12:17 od1 slapd[573]: bdb(dc= od1,dc=domain,dc=lan): Invalid log file: log.0000000470: Invalid argument
Nov 6 18:12:17 od1 slapd[573]: bdb(dc=od1,dc= domain,dc=lan): First log record not found
Nov 6 18:12:17 od1 slapd[573]: bdb(dc=od1,dc= domain,dc=lan): PANIC: Invalid argument
Nov 6 18:12:18 od1 slapd[573]: bdb dbopen: Database cannot be recovered, err -30978. Restore from backup!
Nov 6 18:12:18 od1 slapd[573]: bdb(dc=od1,dc= domain,dc=lan): DB ENV->lock_idfree interface requires an environment configured for the locking subsystem
Nov 6 18:12:18 od1 slapd[573]: bdb(dc=od1,dc= domain,dc=lan): txn_checkpoint interface requires an environment configured for the transaction subsystem
Nov 6 18:12:18 od1 slapd[573]: bdb dbclose: txn_checkpoint failed: Invalid argument (22)
Nov 6 18:12:18 od1 slapd[573]: backend startupone: bi dbopen failed! (-30978)
Nov 6 18:12:18 od1 slapd[573]: bdb dbclose: alock_close failed
Nov 6 18:12:18 od1 slapd[573]: slapd stopped.
Nov 6 18:12:18 od1 slapd[573]: connections_destroy: nothing to destroy.
------------------------
Before I head down the archive-demote-promote-restore path, does anyone have any ideas? Additionally, what mechanism is there to keep this log file from reaching 3GB if the periodic maintenance routines don't work?
Does anyone else have a huge kdc.log file?
Thanks for any thoughts or feedback.
Lyle M
Xserve G4 1.3DP, Mac OS X (10.5.8)