14 Replies Latest reply: Feb 15, 2008 5:56 AM by pterobyte
Jeffrey Lee Level 4 Level 4 (2,495 points)
I just installed 10.5.1 yesterday, and some time this AM, around 4:44, the server decided to reboot for some reason, but didn't. When I got in, the computer was not stalled in a kernel panic, but stuck on the Mac logo during the startup process. Once restarted, all appears well.

I've looked at the logs, and honestly don't know what I'm looking at or for. And of course, there are more logs than you can shake a stick at, so I'm not even sure if I'm looking at the RIGHT log to determine what went wrong.

In the Console/All Messages, there is a string of entries right before the attempted re-start, but there is a string of messages in all the logs starting from when I installed it yesterday. The only time lapse is between the last entries earlier this AM at the failed reboot, and when I came in to re-start it.

Any ideas of where I should look?



FWIW, this message appears every hour, and is looks like the last entry before it crashed:

11/21/07 4:44:56 AM com.apple.launchd[1] (edu.cmu.andrew.cyrus.master[26613]) Stray process with PGID equal to this dead job: PID 26614 PPID 1 ctl_cyrusdb
11/21/07 4:44:56 AM com.apple.launchd[1] (edu.cmu.andrew.cyrus.master[26613]) Exited abnormally: User defined signal 1
11/21/07 4:44:56 AM com.apple.launchd[1] (edu.cmu.andrew.cyrus.master) Throttling respawn: Will start in 8 seconds

If anyone can assist in helping me solve this, I'd be more than grateful.


TIA

Message was edited by: Jeffrey Lee

Message was edited by: Jeffrey Lee

Quad 2.5, Dual 2.3 G5, and 15" MBP, Mac OS X (10.4.10)
  • 1. Re: Server didn't reboot
    Roger Smith3 Level 6 Level 6 (13,475 points)
    It lookss like your mailserver is crashing repeatedly and launchd is trying to restart it too fast, so it waits a little while before trying again.

    Roger
  • 2. Re: Server didn't reboot
    Christian Leue Level 1 Level 1 (10 points)
    Hi Jeffrey,

    I'm experiencing the same problems.

    To me, it appears as if ctl_cyrusdb is taking far too long to complete, so launchd is killing it. This happens repeatedly. With each new attempt, disk space vanishes until it is finally exhausted. I assume the act of killing ctl_cyrusdb does not free the memory it has allocated for itself.

    Running ctl_cyrusdb manually on my machine takes about two minutes, far longer than the default 20 seconds launchd waits for.

    I don't yet know why running ctl_cyrusdb takes so long, there's only a dozen mailboxes and I haven't figured out yet how to activate detailled logging of what it is waiting for.

    Cheers,
    Christian
  • 3. Re: Server didn't reboot
    Roger Smith3 Level 6 Level 6 (13,475 points)
    If launchd has a default of 20 seconds, then I would assume that it's either set, or can be adjusted in the plist file for ctl_cyrusdb in /System/Library/LaunchDaemons.

    It may be the size, or number of emails in the db rather than the number of accounts that's taking so long. Whoever is the default alias for the mail server may be getting all the spam.

    Roger
  • 4. Re: Server didn't reboot
    Jeffrey Lee Level 4 Level 4 (2,495 points)
    It's gotta be something with Mail.

    I've got only a small handful of users... and for a new install, the mail database is already 317 MB... with only 5Mb of Mail data store...

    Spam is NOT being delivered and deleted by the users mail clients... I just don't see that much getting through.. so it must be stored on the server somewhere. I'm changing the settings to Delete Junk Mail, and notify the admin user...(me) so we'll see what happens...
  • 5. Re: Server didn't reboot
    Christian Leue Level 1 Level 1 (10 points)
    I've been able to get the machine running again by adding the "-x" flag to

    recover cmd="ctl_cyrusdb -r -x"

    in /etc/cyrus.conf. That just bypasses the database check of course, now I have to find the cause of the problem... rebuilding the users database with mailbfr and converting all cyrus database files from Berkeley DB to Skiplist did not help.

    Cheers,
    Christian
  • 6. Re: Server didn't reboot
    Ivailo Djilianov Level 1 Level 1 (25 points)
    having very similar trouble here
    manually running ctl_cyrusdb -r reports nothing, while on mailserver launch launchd respawns the ctl_cyrusdb endlessly. "-x" doesn't help either.
    I ended up in this situation after a intel->ppc migration.
  • 7. Re: Server didn't reboot
    pterobyte Level 6 Level 6 (10,910 points)
    Have you actually tried reconstructing the cyrus database from scratch?
  • 8. Re: Server didn't reboot
    Ivailo Djilianov Level 1 Level 1 (25 points)
    tried the following:
    stopped mail server
    removed contents of /var/imap/db as per http://docs.info.apple.com/article.html?artnum=306738
    did a mailbfr -f which reports fixing the db, takes a good 30mins, which I believe is good (syslog reports says reconstruct: AOD: user opts: get attributes for user: <username> failed with error: -14479 in the beginning of mailbfr execution)
    it doesn't seem to help
  • 9. Re: Server didn't reboot
    Ivailo Djilianov Level 1 Level 1 (25 points)
    and I'm afraid running the reconstruct command manually panics with a fatal region error
  • 10. Re: Server didn't reboot
    pterobyte Level 6 Level 6 (10,910 points)
    This is odd behaviour.

    You could try the following:
    Stop mail services
    Change the database location from /var/imap (assuming you kept defaults) to let's say /var/imapnew
    Save changes
    Run "mailbfr -o"
    Run "mailbfr -f"
    Start mail services

    If that doesn't help then it is not a cyrus database issue, but some underlying problem.
  • 11. Re: Server didn't reboot
    Ivailo Djilianov Level 1 Level 1 (25 points)
    Update: Server magically 'healed' itself. I know how naive this must sound and believe me, I can't begin to describe how frustrated I'm feeling!
    Anyway, bad thing is that this db corruption seems to repeat itself every time the mail service is rebooted, so I can't say I've found a particularly stable solution.
  • 12. Re: Server didn't reboot
    pterobyte Level 6 Level 6 (10,910 points)
    Since you migrated, could it be your cyrus.conf and imapd.conf were carried over from Tiger? If yes, compare them to Leopard's default files.
  • 13. Re: Server didn't reboot
    Ivailo Djilianov Level 1 Level 1 (25 points)
    Thanks, this solved it!(I hope)
    I did what you suggested and it turned out I had some lines missing from the imapd.conf file. I copied them over from the imapd.conf.default and the mail services are now running. I even tested with a full system restart.
    Note: launchd still takes a couple of process respawns of ctl_cyrusdb before reporting 'done verifying cyrus databases' and proceeds to launch the mail services. I really hope this is normal behavior.
    Once again, Pterobyte, thanks very very much!
  • 14. Re: Server didn't reboot
    pterobyte Level 6 Level 6 (10,910 points)
    Note: launchd still takes a couple of process respawns of ctl_cyrusdb before reporting 'done verifying cyrus databases' and proceeds to launch the mail services.


    This is absolutely normal. Depending on the size of your db, this could even take a while.
    Glad you got it sorted