Kevin Neal

Q: collabauthd crashes after restarting the server

I have a reoccurring issue where I'm unable to log into Profile Manager and Wiki after restarting the server and my only option is to restore from a clone.

 

I believe the problem is with Open Directory but I'm not sure how to fix it.

 

After restoring the server the logs show a repeating message of collabauthd throttling and restarting

Posted on Aug 15, 2014 11:31 AM

Close

Q: collabauthd crashes after restarting the server

  • All replies
  • Helpful answers

Page 1 Next
  • by Linc Davis,

    Linc Davis Linc Davis Aug 15, 2014 12:59 PM in response to Kevin Neal
    Level 10 (207,995 points)
    Applications
    Aug 15, 2014 12:59 PM in response to Kevin Neal

    Logs?

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 17, 2014 5:21 AM in response to Linc Davis
    Level 3 (513 points)
    Servers Enterprise
    Aug 17, 2014 5:21 AM in response to Linc Davis

    I'm just restarting the server again at the moment so I'll gather the logs and post in a few minutes.

     

    To be honest I was hoping you would respond !

     

    In the meantime here is a bit more info.

     

    I make a daily clone of my boot drive using CCC this clone is working perfectly even though it performed a clone after the main boot drive started to show this error, yet if I try to restore the clone the error is still present on the newly cloned drive, which is completely baffling me, in the remote possibility that it is a hard wear issue I have just restored the working clone to a completely different drive, and the error is again present when I boot into it. Which leaves me in a pretty scary position of having just one good copy of the system which I can't restore.

     

     

    This has happened to me a few time and before I have got back up and running by destroying OD and remaking certificates, but I've just tried that this time and it isn't working

     

     

    Sorry for waffling I know you like short answers, but this has got me panicking as I'm off on holiday in a few days and leaving behind a company working on a dodgy server!

     

     

     

    Ok logs ...

     

    System Log

     

    Aug 17 13:09:45 xserve.xxxxxxxxxxxx.co.uk sandboxd[2560] ([2548]): collabpp(2548) deny file-read-data /Library/Preferences/.GlobalPreferences.plist

    Aug 17 13:09:48 --- last message repeated 1 time ---

    Aug 17 13:09:48 xserve com.apple.launchd[1] (com.apple.collabauthd[2608]): Exited with code: 1

    Aug 17 13:09:48 xserve com.apple.launchd[1] (com.apple.collabauthd): Throttling respawn: Will start in 7 seconds

    Aug 17 13:09:52 xserve.xxxxxxxxxxxx.co.uk genatsdb[2576]: *GENATSDB* FontObjects generated = 349

    Aug 17 13:09:53 xserve.xxxxxxxxxxxx.co.uk com.apple.time[2492]: Interval maximum value is 946100000 seconds (specified value: 9223372036854775807).

    Aug 17 13:09:57 xserve com.apple.launchd[1] (com.apple.collabauthd[2636]): Exited with code: 1

    Aug 17 13:09:57 xserve com.apple.launchd[1] (com.apple.collabauthd): Throttling respawn: Will start in 9 seconds

    Aug 17 13:10:08 xserve com.apple.launchd[1] (com.apple.collabauthd[2659]): Exited with code: 1

    Aug 17 13:10:08 xserve com.apple.launchd[1] (com.apple.collabauthd): Throttling respawn: Will start in 9 seconds

    Aug 17 13:10:19 xserve com.apple.launchd[1] (com.apple.collabauthd[2686]): Exited with code: 1

    Aug 17 13:10:19 xserve com.apple.launchd[1] (com.apple.collabauthd): Throttling respawn: Will start in 9 seconds

    Aug 17 13:10:24 xserve.xxxxxxxxxxxx.co.uk BPPortsd[77]: Set Last IP Updatedate: 2014-08-17 12:10:24 +0000

    Aug 17 13:10:30 xserve com.apple.launchd[1] (com.apple.collabauthd[2741]): Exited with code: 1

    Aug 17 13:10:30 xserve com.apple.launchd[1] (com.apple.collabauthd): Throttling respawn: Will start in 9 seconds

    Aug 17 13:10:41 xserve com.apple.launchd[1] (com.apple.collabauthd[2768]): Exited with code: 1

    Aug 17 13:10:41 xserve com.apple.launchd[1] (com.apple.collabauthd): Throttling respawn: Will start in 9 seconds

    Aug 17 13:10:45 xserve.xxxxxxxxxxxx.co.uk screensharingd[2781]: Authentication: SUCCEEDED :: User Name: artisan :: Viewer Address: 192.168.181.205 :: Type: DH

    Aug 17 13:10:52 xserve com.apple.launchd[1] (com.apple.collabauthd[2792]): Exited with code: 1

    Aug 17 13:10:52 xserve com.apple.launchd[1] (com.apple.collabauthd): Throttling respawn: Will start in 9 seconds

     

     

     

    Collaboration Daemon Log

     

    Aug 17 13:11:58 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSExecutionTimer:14 15e2000 +0ms] TIMER: 42ms ---> Executed batch BatchRequest{(

         "Request{ContentService.entityForGUID:(df68acb4-309a-409b-af4c-3193b98b92e1)}",

         "Request{ContentService.canI:entityWithGUID:(read,df68acb4-309a-409b-af4c-3193b 98b92e1)}",

         "Request{ContentService.canI:entityWithGUID:(write,df68acb4-309a-409b-af4c-3193 b98b92e1)}",

         "Request{ContentService.canI:entityWithGUID:(own,df68acb4-309a-409b-af4c-3193b9 8b92e1)}"

      )}

    Aug 17 13:11:58 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:30 15e2000 +0ms] Detaching thread to finish 1 deferred blocks

    Aug 17 13:12:29 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSSessionService:152 155f000 +0ms] willExecuteRequest: Request{SessionService.enterMagicalAuthRealm()} [ROOT]

    Aug 17 13:12:29 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:250 155f000 +63ms] TIMER: 63ms ---> Executed Request{SessionService.enterMagicalAuthRealm()}

    Aug 17 13:12:29 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSContentService:152 15e2000 +0ms] willExecuteRequest: Request{ContentService.processNotificationsForBatch:(D6DE9649-2865-4D18-95B9-F9 1C79FA4A80-318-000000C4DA505D48)} [ROOT]

    Aug 17 13:12:29 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Warning>: [CSContentService:47 15e2000 +15ms] Detected Magic Superuser Auth Token

    Aug 17 13:12:29 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:250 15e2000 +1ms] TIMER: 1ms ---> Executed Request{ContentService.processNotificationsForBatch:(D6DE9649-2865-4D18-95B9-F9 1C79FA4A80-318-000000C4DA505D48)}

    Aug 17 13:12:29 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:30 15e2000 +0ms] Detaching thread to finish 1 deferred blocks

    Aug 17 13:14:54 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSSessionService:152 155f000 +0ms] willExecuteRequest: Request{SessionService.currentOrNewSession()} [ROOT]

    Aug 17 13:14:54 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:250 155f000 +26ms] TIMER: 26ms ---> Executed Request{SessionService.currentOrNewSession()}

    Aug 17 13:14:54 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSSettingsService:152 16e8000 +0ms] willExecuteRequest: Request{SettingsService.globalSettingForKey:(projectCreators)} [ROOT]

    Aug 17 13:14:54 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:250 16e8000 +5ms] TIMER: 6ms ---> Executed Request{SettingsService.globalSettingForKey:(projectCreators)}

    Aug 17 13:14:59 xserve.xxxxxxxxxxxx.co.ukcollabd[319] <Info>: [CSSessionService:152 15e2000 +0ms] willExecuteRequest: Request{SessionService.currentOrNewSession()} [ROOT]

    Aug 17 13:14:59 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:250 15e2000 +72ms] TIMER: 72ms ---> Executed Request{SessionService.currentOrNewSession()}

    Aug 17 13:14:59 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSSettingsService:152 16e8000 +0ms] willExecuteRequest: Request{SettingsService.globalSettingForKey:(projectCreators)} [ROOT]

    Aug 17 13:14:59 xserve.xxxxxxxxxxxx.co.uk collabd[319] <Info>: [CSServiceDispatcher:250 16e8000 +5ms] TIMER: 6ms ---> Executed Request{SettingsService.globalSettingForKey:(projectCreators)}

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 17, 2014 5:30 AM in response to Kevin Neal
    Level 3 (513 points)
    Servers Enterprise
    Aug 17, 2014 5:30 AM in response to Kevin Neal

    As far as I can tell everything else on the server is working normally except the wiki "Log in to access more services" and Profile manager web page, both just hang indefinitely instead of bringing up the login window

     

    I have tried:

     

    unloading and loading com.apple.collabauthd

     

    Deleting Server.app and replacing it

     

    Destroying OD and restoring from backup ( has worked in the past )

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 17, 2014 5:34 AM in response to Kevin Neal
    Level 3 (513 points)
    Servers Enterprise
    Aug 17, 2014 5:34 AM in response to Kevin Neal

    This log may also be relevant:

     

     

    [Sun Aug 17 12:45:56 2014] [notice] FastCGI: process manager initialized (pid 350)

    [Sun Aug 17 12:45:56 2014] [warn] RSA server certificate CommonName (CN) `xserve.xxxxxxxxx.co.uk' does NOT match server name!?

    [Sun Aug 17 12:45:56 2014] [warn] RSA server certificate CommonName (CN) `xserve.xxxxxxxxx.co.uk' does NOT match server name!?

    [Sun Aug 17 12:45:56 2014] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)

    [Sun Aug 17 12:45:58 2014] [notice] Apache/2.2.26 (Unix) DAV/2 PHP/5.3.28 mod_fastcgi/2.4.6 mod_ssl/2.2.26 OpenSSL/0.9.8y configured -- resuming normal operations

    [Sun Aug 17 12:59:13 2014] [crit] (17)File exists: mod_rewrite: Parent could not create RewriteLock file /var/log/apache2/rewrite.lock

    Configuration Failed

    [Sun Aug 17 12:59:21 2014] [warn] RSA server certificate CommonName (CN) `xserve.xxxxxxxxx.co.uk' does NOT match server name!?

    [Sun Aug 17 12:59:21 2014] [warn] RSA server certificate CommonName (CN) `xserve.xxxxxxxxx.co.uk' does NOT match server name!?

    [Sun Aug 17 12:59:21 2014] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)

    [Sun Aug 17 12:59:22 2014] [warn] module fastcgi_module is already loaded, skipping

    Warning: DocumentRoot [/Library/Server/Web/Data/Sites/mail.xxxxxx.co.uk] does not exist

    httpd: apr_sockaddr_info_get() failed for xserve.xxxxxxxx.co.uk

    httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName

    [Sun Aug 17 12:59:31 2014] [notice] FastCGI: process manager initialized (pid 337)

    [Sun Aug 17 12:59:32 2014] [warn] RSA server certificate CommonName (CN) `xserve.xxxxxxxxx.co.uk' does NOT match server name!?

    [Sun Aug 17 12:59:32 2014] [warn] RSA server certificate CommonName (CN) `xserve.xxxxxxxxx.co.uk' does NOT match server name!?

    [Sun Aug 17 12:59:32 2014] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)

    [Sun Aug 17 12:59:33 2014] [notice] Apache/2.2.26 (Unix) DAV/2 PHP/5.3.28 mod_fastcgi/2.4.6 mod_ssl/2.2.26 OpenSSL/0.9.8y configured -- resuming normal operations

    [Sun Aug 17 13:16:17 2014] [error] [client 192.168.181.205] (70007)The timeout specified has expired: proxy: error reading response

    [Sun Aug 17 13:26:26 2014] [error] [client 192.168.181.205] (70007)The timeout specified has expired: proxy: error reading response

  • by Linc Davis,

    Linc Davis Linc Davis Aug 17, 2014 10:49 AM in response to Kevin Neal
    Level 10 (207,995 points)
    Applications
    Aug 17, 2014 10:49 AM in response to Kevin Neal

    The story isn't entirely consistent, but the server does seem to have an SSL certificate with the wrong CN.

     

    You must have a working DNS service, and the server's hostname must match its fully-qualified domain name. To confirm, select the server by name in the sidebar of the Server application window, then select the Overview tab. Click the Edit button on the Host Name line. On the Accessing your Server sheet, Domain Name should be selected. Change the Host Name, if necessary.


    Check the validity of the server's certificates. The common name must match the hostname and domain name. Deselecting and then reselecting the certificate in Server.app has been reported to have an effect in some cases. Otherwise delete all certificates and create new ones.

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 17, 2014 11:42 AM in response to Linc Davis
    Level 3 (513 points)
    Servers Enterprise
    Aug 17, 2014 11:42 AM in response to Linc Davis

    Checked the common name and it does match the host name.

     

    Checked DNS, host command shows the DNS resolves correctly both directions.

     

    Deselecting and reselecting certificates has no effect.

     

    I'll try deleting the certificates again but I tried that earlier and it didn't help.

     

    The most frustrating thing is that the clone is working fine, even after multiple restarts, yet a block level restore results in a server with the collabauthd error

  • by Linc Davis,

    Linc Davis Linc Davis Aug 17, 2014 12:03 PM in response to Kevin Neal
    Level 10 (207,995 points)
    Applications
    Aug 17, 2014 12:03 PM in response to Kevin Neal

    If that's really true, then you have a failing drive.

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 17, 2014 12:41 PM in response to Linc Davis
    Level 3 (513 points)
    Servers Enterprise
    Aug 17, 2014 12:41 PM in response to Linc Davis

    Thats what I thought, so I restored the clone to a completely different drive and the same thing happened, tried restoring using both CCC and disk utility.

     

    Also I have just deleted the certificates and remade them but I'm still getting the errors

  • by Linc Davis,

    Linc Davis Linc Davis Aug 17, 2014 6:14 PM in response to Kevin Neal
    Level 10 (207,995 points)
    Applications
    Aug 17, 2014 6:14 PM in response to Kevin Neal

    So everything works fine when you boot from this one volume. but not when you boot from either of two different block copies of it?

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 18, 2014 12:05 AM in response to Linc Davis
    Level 3 (513 points)
    Servers Enterprise
    Aug 18, 2014 12:05 AM in response to Linc Davis

    Yes strange as it sounds thats correct, I am booted from the working volume at the moment and everything is fine, but I have no way of making a reliable working copy of it as a backup.

     

    I just tried a non block copy of it as well just incase. I've also tried making a backup of the OD of this working volume and then switch to one of the other volumes an destroy the OD and restoring with the backup OD, again not working.

     

    I need to either find a way to make a good copy of the working volume or a way to recover from the error.

     

    Long term I know the most likely solution is to start completely from scratch

  • by Linc Davis,

    Linc Davis Linc Davis Aug 18, 2014 10:01 AM in response to Kevin Neal
    Level 10 (207,995 points)
    Applications
    Aug 18, 2014 10:01 AM in response to Kevin Neal

    How are all these drives connected?

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 18, 2014 1:29 PM in response to Linc Davis
    Level 3 (513 points)
    Servers Enterprise
    Aug 18, 2014 1:29 PM in response to Linc Davis

    They are Xserve drive bays, each on a separate drive

     

    2009 Xserve running latest Mountain Lion with Server 2.2.2

  • by Kevin Neal,

    Kevin Neal Kevin Neal Aug 18, 2014 1:59 PM in response to Linc Davis
    Level 3 (513 points)
    Servers Enterprise
    Aug 18, 2014 1:59 PM in response to Linc Davis

    I've just tried a slightly different approach, I booted up into the recovery partition, checked the drives and repaired permissions from there then restored again, this time when I booted up the newly restored drive was working, though I did notice wiki server and profile manager took a very long time to start up.

     

    But at least this buys me some time now I have two usable drives.

     

    Would love to get to the bottom of why these services suddenly fail, but as the logs don't seem to give much meaningful information, maybe a fresh start with Mavericks is a good option

  • by Linc Davis,

    Linc Davis Linc Davis Aug 18, 2014 3:51 PM in response to Kevin Neal
    Level 10 (207,995 points)
    Applications
    Aug 18, 2014 3:51 PM in response to Kevin Neal

    After a block copy, the destination volume would have the same permissions as the source volume, so repairing them would have no effect. It's conceivable that you have one good drive bay and two bad ones. I don't have a better theory.

Page 1 Next