Apple Event: May 7th at 7 am PT

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

DirectoryServices crashing since 10.6.3

Hi,
I just upgraded to 10.6.3 on an XServe, and it seemed to work ok. But after a while (maybe an hour) nobody can log on to the server any more. Not ServerAdmin, not ScreenSharing... I have twice now just powered it off, since I didn't see anything else I could do. Anyway, after last restart I just stayed logged in to see if anything happening. The log entries that I assume are the relevant ones:


Apr 18 19:03:19 mail com.apple.DirectoryServices[29]: Assertion failed: (name1), function rbt compare_namekey, file /SourceCache/DirectoryServiceDaemon/DirectoryService-621.3/Server/Mbrd_HashTabl e.c, line 88.
Apr 18 19:03:19 mail com.apple.launchd[1] (com.apple.DirectoryServices[29]): Job appears to have crashed: Abort trap
Apr 18 19:03:19 mail DirectoryService[3610]: Improper shutdown detected
Apr 18 19:03:19 mail servermgrd[62]: servermgr_accounts: noteDirectoryNodeAdded (reopening nodes)
Apr 18 19:03:19 mail com.apple.ReportCrash.Root[3609]: 2010-04-18 19:03:19.696 ReportCrash[3609:2803] Saved crash report for DirectoryService[29] version ??? (???) to /Library/Logs/DiagnosticReports/DirectoryService 2010-04-18-190319localhost.crash

Apr 18 19:03:21 mail servermgrd[62]: servermgr_accounts: got error 5300 trying to auth to local LDAP node
Apr 18 19:03:21 mail servermgrd[62]: servermgr_accounts: noteDirectorySearchPolicyChanged (reopening nodes)
Apr 18 19:03:21 mail servermgrd[62]: servermgr_accounts: got error 5300 trying to auth to local LDAP node
Apr 18 19:03:31 mail ManagedClient[3613]: MCX.MCXD_StartMCX Unable to find matching user name for uid 1000.
Apr 18 19:03:31 mail com.apple.loginwindow[69]: 2010-04-18 19:03:31.487 ManagedClient[3613:903] MCX.MCXD_StartMCX Unable to find matching user name for uid 1000.
Apr 18 19:03:31 mail ManagedClient[3613]: ManagedClient- runloop could not be started; status = -23
Apr 18 19:03:31 mail com.apple.loginwindow[69]: 2010-04-18 19:03:31.489 ManagedClient[3613:903] ManagedClient- runloop could not be started; status = -23

Apr 18 19:38:18 mail servermgrd[62]: servermgr_accounts: got error 5300 trying to auth to local LDAP node
Apr 18 19:38:50 mail com.apple.launchd.peruser.1000[280] (com.apple.PreferenceSyncAgent[4388]): The account "dlorenz" has been deleted out from under us!
Apr 18 19:38:50 mail com.apple.launchd.peruser.1000[280] (com.apple.PreferenceSyncAgent[4388]): In a future build of the OS, this error will be fatal.
Apr 18 19:38:52 mail com.apple.launchd.peruser.1000[280] (com.apple.SubmitDiagInfo[4389]): The account "dlorenz" has been deleted out from under us!
Apr 18 19:38:52 mail com.apple.launchd.peruser.1000[280] (com.apple.SubmitDiagInfo[4389]): In a future build of the OS, this error will be fatal.
Apr 18 19:38:52 mail /System/Library/CoreServices/coreservicesd[104]: scserverServerCheckin: client uid validation failure; getpwuid(1000) == NULL
Apr 18 19:38:58 mail com.apple.launchd.peruser.1000[280] ([0x0-0x12012].com.apple.iCal.helper[4393]): The account "dlorenz" has been deleted out from under us!
Apr 18 19:38:58 mail com.apple.launchd.peruser.1000[280] ([0x0-0x12012].com.apple.iCal.helper[4393]): In a future build of the OS, this error will be fatal.
Apr 18 19:38:58 mail /System/Library/CoreServices/coreservicesd[104]: scserverServerCheckin: client uid validation failure; getpwuid(1000) == NULL


The last bunch about this being fatal reappears every hour from then on...

Any ideas where/what to check?

Thanks,
Dix

Posted on Apr 18, 2010 2:20 PM

Reply
52 replies

Apr 27, 2010 3:54 PM in response to Ronald Altmann

Ronald Altmann wrote:
I will try to open a support ticket at Apple's Server support.


Thanks, that would be great. I'm at least feeling better that I've found others with the issue and we seem to be honing in on it's cause. I was starting to loose my mind with this. I had to install a monitor and keyboard on the server just to be able to revive it on a constant basis. Normally this thins is up for months at a time.

Apr 27, 2010 6:55 PM in response to mawt

mawt wrote:
I was excluding the complete startup disk, and only backing up an attached drive.

And I was using a grown server, not the combo update. And it wasn't happening all the time. About once a day.

cheers
Mawt

That's pretty much my experience. Time Machine is causing the OD crashes when it completes a backup. As you said, not every time just once every day or two. I have TM off now and it's still good (working on the third day).

Message was edited by: offdwall

Message was edited by: offdwall

Apr 28, 2010 4:06 AM in response to Dix Lorenz

Just to add another "me too", I have a client with a Mac mini 10.6.3 Server where the same error occurs.

I don't know if its of any importance but the crash and the running of hdiejectd are always together time-wise in the system.log.

For the moment I've switched off time machine but clearly that's no solution at all.

Any help would be very much appreciated.

Relevant log part:

Apr 28 12:06:07 solserver com.apple.backupd\[2075\]: Starting standard backup
Apr 28 12:06:08 solserver com.apple.backupd\[2075\]: Backing up to: /Volumes/Time Machine/Backups.backupdb
Apr 28 12:06:09 solserver servermgrd\[71\]: servermgr_backup: TimeMachinePreBackupHook called.
Apr 28 12:06:25 solserver hdiejectd\[2129\]: running
Apr 28 12:06:31 solserver serveradmin\[2148\]: -\[AccountsRequestHandler(AccountsOpenDirectoryHelpers) openLocalLDAPNodeIfNeeded\]: dsLocalLDAP = (null), error = Error Domain=com.apple.OpenDirectory Code=2000 UserInfo=0x10022ac60 "Unable to open Directory node with name /LDAPv3/127.0.0.1."
Apr 28 12:06:31 solserver serveradmin\[2149\]: Error retrieving Global Shared Address Book info: Error Domain=com.apple.SharedAddressBookAdmin Code=1 UserInfo=0x1005023b0 "Record with GUID '1986FC8D-AE37-48DB-A661-02973E83FD14' not found"
Apr 28 12:06:37 solserver hdiejectd\[2129\]: quitCheck: calling exit(0)
Apr 28 12:07:21 solserver com.apple.DirectoryServices\[28\]: Assertion failed: (name1), function rbt compare_namekey, file /SourceCache/DirectoryServiceDaemon/DirectoryService-621.3/Server/Mbrd_HashTabl e.c, line 88.
Apr 28 12:07:21 solserver hdiejectd\[2291\]: running
Apr 28 12:07:24 solserver com.apple.launchd\[1\] (com.apple.DirectoryServices\[28\]): Job appears to have crashed: Abort trap
Apr 28 12:07:24 solserver DirectoryService\[2294\]: Improper shutdown detected
Apr 28 12:07:24 solserver com.apple.ReportCrash.Root\[2290\]: 2010-04-28 12:07:24.967 ReportCrash\[2290:1407\] Saved crash report for DirectoryService\[28\] version ??? (???) to /Library/Logs/DiagnosticReports/DirectoryService 2010-04-28-120724localhost.crash

Apr 28, 2010 4:39 PM in response to Ernst Mulder

There is another thread discussing what appears to be the same beastie:

http://discussions.apple.com/thread.jspa?threadID=2171224

From what I have worked out (plus some guesses), is that the TimeMachine backups call a script (not sure where it is) to prepare for the backup (TimeMachinePreBackupHook).

Amongst other things, the TMPreBackupHook dumps out the OD database into an LDIF file that is suitable for backing up (since the actual DB is not in a suitable stable state for backing up), and I think this is where the problem lies. What that means is that excluding directories from TM is not going to help, as the TMPreBackupHook script still gets called.

For some, the problem has manifest before 10.6.3, and others are getting it now. As yet, we don't know what configurations trigger it, and why some get it now, and some got it before.

I'm guessing the issue is possibly a timing issue between the TM script dumping out the LDIF file and OD doing some routine task.

The result of the OD crash is that the authentication process for other services (like SMB, AFP, logging in etc) fails, and this is often where we first notice it.

My suggestion is to either turn off TM and run it manually when you are able to restart / repair the OD database, or cross your fingers that you are one of the lucky ones (if there are any).

I had hoped that 10.6.3 would be the solution, but given this thread, I have just turned off my TM without giving it enough time to fail. I won't be turning it on again until an update that claims to fix the TM issue, or someone finds a likely solution.

Apr 29, 2010 12:31 AM in response to SnoRock

A timing issue, you might be on to something here.

The OD crash is always right in between "TimeMachinePreBackupHook called" and "TimeMachinePreBackupHook done".

TimeMachinePreBackupHook doesn't seem to be a file based script, at least I can not find it. But it seems to backup server-specific stuff in a directory named /.ServerBackups

When TimeMachinePreBackupHook stops the OD the log is filled with OD related errors until TimeMachinePreBackupHook is done. I'm guessing that OD being down is not causing the crash. Here is part of the log on a successful run of Time Machine on the server ->

Apr 28 02:34:59 solserver serveradmin\[6898\]: -\[AccountsRequestHandler(AccountsOpenDirectoryHelpers
) openLocalLDAPNodeIfNeeded\]: dsLocalLDAP = (null), error = Error Domain=com.apple.OpenDirectory Code=2000 UserInfo=0x10022a190 "Unable to open Directory node with name /LDAPv3/127.0.0.1."
Apr 28 02:34:59 solserver serveradmin\[6899\]: -\[AccountsRequestHandler(AccountsOpenDirectoryHelpers) openLocalLDAPNodeIfNe[eded\]: dsLocalLDAP = (null), error = Error Domain=com.apple.OpenDirectory C
ode=2000 UserInfo=0x100501600 "Unable to open Directory node with name /LDAPv3/127.0.0.1."
Apr 28 02:34:59 solserver serveradmin\[6900\]: Error retrieving Global Shared Address Book info: Error Domain=com.apple.SharedAddressBookAdmin Code=1 UserInfo=0x100221650 "Record with GUID '1986FC8D
-AE37-48DB-A661-02973E83FD14' not found"

This is what is different in the log when OD crashes and is not restarted ->

Apr 28 12:07:21 solserver com.apple.DirectoryServices\[28\]: Assertion failed: (name1), function rbt compare_namekey, file /SourceCache/DirectoryServiceDaemon/DirectoryService-621.3/Server/Mbrd_HashTabl e.c, line 88.

So, what is in Mbrd_HashTable.c on line 88? 🙂

Since the assertion failure is reported by com.apple.DirectoryServices can we assume that OD crashes during its forced shutdown or restart?

Apr 29, 2010 12:59 AM in response to Ernst Mulder

This is what I find in Mbrd_HashTable.c:

static int rbt compare_namekey( const struct rb_node *n, const void *v )
{
const char *name1 = RBNODE TOUSERGROUP(n)->fName;
const char *name2 = (const char *) v;

assert(name1); *// <- this is line 88*
assert(name2);
return strcmp( name1, name2 );
}

The source is here:

<a class="jive-link-external-small" href="http://">http://www.opensource.apple.com/source/DirectoryService/Director yService-621.3/Server/Mbrd_HashTable.c?txt

This doesn't help. No way to find out when name rbt_ops->rbto_comparekey is invoked and why the assertion on name1 fails. It seems to be used in rb treeinit (defined in rb.c).

Apr 29, 2010 5:06 AM in response to Ernst Mulder

I think what's at root here, is that serveradmin makes a call to OpenDirectory and that fails,
because at the moment in question, OD is not available.

It is possible to cleanly stop (and then restart) slapd, and you can find several example scripts for backing up OD at afp548.com

If you use Server Admin to export/backup OD, you'll see in the logs that it invokes
slapconfig -backupdb

So, whatever is happening is taking out slapd UN-cleanly, after which some routine engaged by TM tells serveradmin to make a call to OD and that fails.

May 1, 2010 8:46 PM in response to Dix Lorenz

Same issue. Took us a while to figure it out. Right after the 10.6.3 update, we started seeing users losing AFP connectivity. I associated it with Time Machine, saw some of the early posts on the topic last week to confirm my suspicions, and decided to turn it off.

What I DIDN'T notice was that Time Machine had also created an enormous (70GB) log file of some sort. At the same time, we had our other logs in debug mode, so -- you guessed it - the SSD drive that houses the system on our Xserve filled up. Friday morning, there were no services at all. And - BEWARE - Open Directory was completely overwritten. We lost all users, share points, permissions, etc.

We deleted the huge file, restored from the most recent archive and got it operational again.

We turned on Time Machine. We immediately lost AFP and Mail access, but let it run anyway, just to see.

About 4 hours later, 40 GB of space had been eaten up by Time Machine on the SSD system drive (yet we're backing up to a Time Capsule). We turned Time Machine off and AFP and Mail access were restored. We did it again, just in awe of the inexcusable sloppiness of it. Turn on Time machine, lose Open Directory and kill your system drive space. Turn it off, go back to normal.

Wow.

May 8, 2010 12:31 PM in response to Ronald Altmann

Ronald Altmann wrote:
... Also I will try to open a support ticket at Apple's Server support. This is ridiculous!

Cheers,
Ron.


Have you had any luck with opening a support ticket with Apple?

For the record, since turing TM off, my 10.6.3 server has been up 11days, 14hours, 35 min with no OD crashes. Problem certainly located, but not solved since I can't run TM on the server and had to revert back to backing it up with Retrospect.

May 26, 2010 8:45 PM in response to Dix Lorenz

I was experiencing the same issues. OpenDirectory was restarting every time TimeMachine backed up the server. Every now and then, I'd experience authentication errors and missing share points.

I'm currently using the following workaround:
It appears that TimeMachinePreBackupHook was calling ServerBackup (based on what others have said) to backup Open Directory. This is when it would crash. ServerBackup uses plist files in /etc/server_backup. I simply moved the plists for Open Directory and Share Points. My server has been running perfectly for about 2 days without a single restart of Open Directory.

To move the files:
sudo mkdir /etc/server_backup.disabled
sudo mv /etc/server_backup/40-openDirectory.plist /etc/server_backup.disabled/
sudo mv /etc/server_backup/55-sharePoints.plist /etc/server_backup.disabled/



Note: moving the files back to their original locations on my test server caused the problems to reoccur, so I'm confident in this temporary fix (until there's an official fix from Apple)

DirectoryServices crashing since 10.6.3

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.