Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

For all of you having services failing to start

After 48 hours of struggle with this same issue and 6 clean re-installs of Lion Server I have found the bug in Lion Server causing all ruby based collaboration services (Device Manager, Wiki, ical, some adress book features ie: major screw up in services around server admin tools and server app). The most visible one is in Profile Manager because as you all pointed out it even says sometimes "Error Reading Settings". And if you take a look at the logs its even worst...full of errors...


that's how I found out, yes reading all the logs took time:


Basically they all fail because they use Postgresql database.


At first I did 2 clean re-installs and noticed everytime, after having spent some time configuring the server (open directory/kerneros, creating accounts/mailboxes, profiles etc.). I would do a reboot and everything would break.


Now I won't go over all the diggin I did but I finally manage to understand why Postgres at some point was failing.


it seems there is a bug.


If you turn "Dedicate Resources to Server Services" in the Server.app Hardware Section (next to Push Notifications switch", postgres doesn't start and all depending services (lots) fail.


The Solution: Just turn that OFF as shown below and restart. Everything should get back in order. If you still see some "push_notify: not connected" erros in you console logs (it happened to me even thoug all servcies were restored) the solution is easy. Hit change and redo the setup with you appleid. You'll be issued new certs by Apple and everything shoudl work fine.

User uploaded file

That's all.


Hope this helps the many people that are frustrated like I was. Now that everything works, it's the perfect server for a mini Cloud. You'll love Profile Manager for provisioning payload to your devices. Elegant, efficient and simple, yet very flexible with the openDirectory backend.


Cheers everyone !


Eric


twitter: @teknologism

Mac OS X (10.7)

Posted on Jul 24, 2011 7:17 PM

Reply
119 replies

Oct 31, 2011 4:23 PM in response to jubair

I have contacted apple support about my problem with services "error reading settings" and they said my issue was due to me changing my IP address after the initial setup screen. So I have had to reinstall yet again, fingers crossed the issue doesn't persist now! His advice was to image a 'virgin' copy of Lion so I don't have to re-download again if I'm still having trouble. What I don't get is when you do change the IP the server app tells you and give you the option to recover/reset the services??


Hope this helps others too

Oct 31, 2011 5:28 PM in response to biolayer

to be completely honest, I think thats just a BS answer from a first level support rep. Every time I have had this problem it has been because postgre was crashing. What I do is to restore the postgre database from the last time machine back-up and restart. Luckily I haven't had the problem in about to months now. But it has happened several times.

Nov 1, 2011 6:08 AM in response to Joe Pyrdek

I installed the 10.7.2 patch on my server when it came out, and last night I noticed that all my services were down - and I hadn't restarted.


Postgres was down.


I ran repair permissions ... and then checked the permissions of /var/log and opened it up, and went about manually trying to restart everything and eventually got it all back up again.


Ugh!

Jan 14, 2012 1:16 AM in response to The Teknologist

Hello folks,


I'm not sure if this discussion is still very alive but I've been experiencing the same problems and troubleshooting it is not easy.


I'm experiencing the failing services as well, most noticably the fact that postgres shuts down and won't restart. I've been able to drill it down to the point that it is in fact an issue with launchd which I will explain later.


What happens is that after a reboot most services work fine, as long as I don't have a Server.app or Server Admin tool trying to connect to the server. Connecting with one of these tools often, but not always immediatley, results in a message "Shutting down idle postgres...." messages which will stop postgres. These messages are not the core of the problem, but the fact that postgres won't restart.

On a working server idle processes are stopped as well, but new ones are created when needed. On the failing server (a mac-mini 2010 with a clean install) postgres can't be started again.


When trying to start postgres from the commandline a 'cannot start postgres timeout message' appears. And without postgres there is not much running on the system. Once this happens a can't user Server Admin to manage DNS, Netboot or DHCP although these are still running.


Diving deeper in the system I noticed that servermgrd is the common factor so I decided to investigate the behaviour of this process.

When trying to manage a service from the commandline I used dtruss to follow the execution flow and learned that servermgr is depending on launchd.


And that is were my problem is. Launchd is working fine for all users but root. As root I get the message launchmgs_error(): socket not connected when typing something like launchctl load -w /System/Library/LaunchDaemons/com.apple.collabd.plist


Even a 'simple' launchctl limit command fails with the same error while launchctl list works fine.


The above commands work fine when I enter them as admin with sudo. The problem then is that the services will start, but with the wrong credentials, which results in failure as well since the logfiles can't be accessed etc. etc.


I've compared all the permissions that I can think of with a working system. Did a repair disk from disk utility but still can't figure out why launchd can't be run as root.


Below the dtruss trace that show the program hierachy, hopefully somebody will speak the magic words that help me solve launchd, or at least can help me troubleshooting it.


/mac-mini:Public root# (dtruss -f "serveradmin start postgres")

PID/THRD SYSCALL(args) = return

...

77524/0xe0324: open("/usr/share/servermgrd/bundles/servermgr_postgres.bundle/Contents/MacOS/se rvermgr_postgres\0", 0x0, 0x1FF) = 4 0

...

77524/0xe0324: open("/private/var/servermgrd//servermgr_postgres.lock\0", 0x2, 0x1F8) = 4 0

...

77524/0xe0324: access("/bin/launchctl\0", 0x1, 0x400) = 0 0

77524/0xe0324: geteuid(0x7FFF73BA2C40, 0x7FFF73BA1CA0, 0x7FFF73BA2C40) = 0 0

77524/0xe0324: pipe(0x7FFF6BEED440, 0x7F8ACA0173F0, 0x10) = 5 0

77524/0xe0324: pipe(0x7FFF6BEED438, 0x7F8ACA0173F0, 0x6) = 7 0

77524/0xe0324: pipe(0x7FFF6BEED430, 0x7F8ACA0173F0, 0x8) = 9 0

77524/0xe0324: pipe(0x7FFF6BEED428, 0x7F8ACA0173F0, 0xA) = 11 0

= 0 0

77524/0xe0324: open("/usr/share/servermgrd/bundles/servermgr_postgres.bundle/Contents/Info.pli st\0", 0x0, 0x1B6) = 3 0

77524/0xe0324: open("/usr/sbin/serveradmin\0", 0x0, 0x1FF) = 3 0

...

77525/0xe033b: open("/var/db/launchd.db/com.apple.launchd/overrides.plist\0", 0x220, 0x180) = 3 0

77525/0xe033b: open("/var/db/launchd.db/com.apple.launchd/overrides.plist\0", 0x0, 0x1B6) = 4 0

77525/0xe033b: read(0x4, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<dict>\n\t<key>com.apple.AEServer</key>\n\t<dict>\n\t\t<key>D isabled</key>\n\t\t<false/>\n\t</dict>\n\t<ke", 0x19CD = 6605 0

... = 0 0

77525/0xe033b: open("/System/Library/LaunchDaemons/org.postgresql.postgres.plist\0", 0x0, 0x1B6) = 4 0

77525/0xe033b: read(0x4, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<dict>\n\t<key>GroupName</key>\n\t<string>_postgres</string>\ n\t<key>Label</key>\n\t<string>org.post", 0x4D1) = 1233 0

...

77525/0xe033b: socket(0x1, 0x1, 0x0) = 4 0

77525/0xe033b: fcntl_nocancel(0x4, 0x2, 0x1) = 0 0

77525/0xe033b: connect_nocancel(0x4, 0x7FFF65119DB0, 0x6A) = -1 Err#61

77525/0xe033b: close_nocancel(0x4) = 0 0

77525/0xe033b: getrlimit(0x1008, 0x7FFF65119900, 0x80) = 0 0

77525/0xe033b: write_nocancel(0x2, "launch_msg(): Socket is not connected\n\0", 0x26) = 38 0

...

77525/0xe033b: open("/bin/launchctl\0", 0x0, 0x1FF) = 4 0

...

77525/0xe033b: open("/var/db/launchd.db/com.apple.launchd/overrides.plist\0", 0x601, 0x1B6) = 4 0

77525/0xe033b: write(0x4, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<dict>\n\t<key>com.apple.AEServer</key>\n\t<dict>\n\t\t<key>D isabled</key>\n\t\t<false/>\n\t</dict>\n\t<ke", 0x19CD) = 6605 0

77524/0xe0324: select(0xA, 0x7FFF6BEED540, 0x7FFF6BEED4C0, 0x0, 0x0) = 2 0

77524/0xe0324: read(0x9, "abauthd</key>\n\t<dict>\n\t\t<key>Disabled</key>\n\t\t<false/>\n\t</dict>\n\t< key>com.apple.collabcored1</key>\n\t<dict>\n\t\t<key>Disabled</key>\n\t\t<false/ >\n\t</dict>\n\t<key>com.apple.collabcored2</key>\n\t<dict>\n\t\t<key>Disabled</ key>\n\t\t<false/>\n\t</dict>\n\t<key>com.apple.collab", 0x1000) = 0 0

77524/0xe0324: close(0x9) = 0 0

77524/0xe0324: kill(0xFFFED12B, 0x9, 0x1) = -1 Err#1

77524/0xe0324: wait4(0x12ED5, 0x7FFF6BEED45C, 0x0) = 77525 0

77524/0xe0324: geteuid(0x7FFF6BEEDB90, 0x400, 0x7F8AC8D00010) = 0 0

77524/0xe0324: socket(0x1, 0x1, 0x0) = 5 0

..

= 1 0

77524/0xe0324: read(0x9, "launch_msg(): Socket is not connected\n\0", 0x1000) = 38 0

...

77524/0xe0324: write_nocancel(0x1, "postgres:error = \"CANNOT_START_SERVICE_TIMEOUT_ERR\"\n\0", 0x34) = 52 0

Jan 14, 2012 7:26 AM in response to Marco V

Marco-


Hmmmm. So I took some of the launch sequence queues from your dtruss output, and it raises a couple questions.


First, let me say that there appear to be a lot of miscellaneous permissions consistency issues with OS X server, and we've all had to tinker with the 'repair' process to try to overcome them. I've had issues where Postgres fails to start because for some inexplicable reason the /var/log folder got locked down (it's covered in the thread above).


So I checked my machine for the various plist files highlighted above, and I'd bet a broken iPhone that we'll see different permissions + owner +group settings on various machines out there.


-rw-r--r-- 1 root wheel 1208 Oct 14 09:26 /usr/share/servermgrd/bundles/servermgr_postgres.bundle/Contents/Info.plist

-rw------- 1 root wheel 7461 Jan 13 20:31 /var/db/launchd.db/com.apple.launchd/overrides.plist

-rw-r--r-- 1 root wheel 1272 Dec 21 13:40 /System/Library/LaunchDaemons/org.postgresql.postgres.plist

drwxr-x--- 4 _postgres _postgres 136 Dec 21 14:07 /var/pgsql_socket


Have you noticed that you get the /var/pgsql.pre-restore-2011-12-21_13:42:26_EST type recovery directories when you have a failure? I have also found a 1:1 correlation that there are coinciding recovery directories here:

/etc/certificates.before-restore-Sun_Dec_18_14:29:22_2011

/etc/apache2.before-restore-Wed_Dec_21_13:43:41_2011


Have you noticed those either?


Ownerships also tell a story:

drwxr-xr-x 42 root wheel 1428 Dec 21 13:43 /etc/apache2.before-restore-Wed_Dec_21_13:43:41_2011

drwxr-xr-x 54 root wheel 1836 Dec 10 17:47 /etc/certificates.before-restore-Wed_Dec_21_13:43:41_2011

drwx------ 17 _postgres _postgres 578 Dec 21 13:42 /var/pgsql.pre-restore-2011-12-21_13:42:26_EST


I believe that the failure to start postgres is caused by a race condition when it's trying to start. Your observation about Lion kicking off Server Admin or Admin apps on restart and postgres failing ... may also be related.


When I manually shut these three services down with the serveradmin utility, move/rename the new respective folders, and then rename the "pre restore" folders to the original names, and manually restart the servcies, everything comes back without data loss.


I was going to upgrade one of my work Mac Pro's to Lion Server to use it as a corporate Podcasting processing server ... but not after this nightmare at home.

Jan 14, 2012 7:32 AM in response to Ocean Digital

Ocean Digital - if you are running Lion or Lion Server you are running Postgres ... it's under the covers.


Everything like the services you mention depend on it. It was MySQL up-to and including SnowLeopard.


If you kick off a terminal session and type the command:


sudo serveradmin status postgres

<password>


You should see something like:

postgres:state = "RUNNING"

Likewise, if you run:

ps -aef | grep post


You should see all the postgres processes running:

216 2937 1 0 21Dec11 ?? 4:16.12 /usr/bin/postgres_real -D /var/pgsql -c listen_addresses=127.0.0.1 -c log_connections=on -c log_directory=/Library/Logs -c log_filename=PostgreSQL.log -c log_line_prefix=%t -c log_lock_waits=on -c log_statement=ddl -c logging_collector=on -c unix_socket_directory=/var/pgsql_socket -c unix_socket_group=_postgres -c unix_socket_permissions=0770

216 2939 2937 0 21Dec11 ?? 0:45.37 postgres: logger process

216 2941 2937 0 21Dec11 ?? 2:55.24 postgres: writer process

216 2942 2937 0 21Dec11 ?? 1:53.97 postgres: wal writer process

216 2943 2937 0 21Dec11 ?? 2:35.56 postgres: autovacuum launcher process

216 2944 2937 0 21Dec11 ?? 6:15.48 postgres: stats collector process

216 3021 2937 0 21Dec11 ?? 2:19.35 postgres: collab collab [local] idle

216 3086 2937 0 21Dec11 ?? 0:00.47 postgres: caldav caldav [local] idle

216 3087 2937 0 21Dec11 ?? 0:00.33 postgres: caldav caldav [local] idle

216 3088 2937 0 21Dec11 ?? 0:00.28 postgres: caldav caldav [local] idle

216 3089 2937 0 21Dec11 ?? 0:00.49 postgres: caldav caldav [local] idle

216 3222 2937 0 21Dec11 ?? 0:00.24 postgres: _devicemgr device_management [local] idle

216 3223 2937 0 21Dec11 ?? 0:00.25 postgres: _devicemgr device_management [local] idle

216 3224 2937 0 21Dec11 ?? 0:00.24 postgres: _devicemgr device_management [local] idle

216 3226 2937 0 21Dec11 ?? 0:00.24 postgres: _devicemgr device_management [local] idle

216 3227 2937 0 21Dec11 ?? 0:00.24 postgres: _devicemgr device_management [local] idle

216 3228 2937 0 21Dec11 ?? 0:00.24 postgres: _devicemgr device_management [local] idle

216 3229 2937 0 21Dec11 ?? 0:00.24 postgres: _devicemgr device_management [local] idle

216 3230 2937 0 21Dec11 ?? 0:00.23 postgres: _devicemgr device_management [local] idle

216 3231 2937 0 21Dec11 ?? 0:00.25 postgres: _devicemgr device_management [local] idle

216 3232 2937 0 21Dec11 ?? 0:00.24 postgres: _devicemgr device_management [local] idle

216 4069 2937 0 23Dec11 ?? 0:00.40 postgres: caldav caldav [local] idle

216 4070 2937 0 23Dec11 ?? 0:01.06 postgres: caldav caldav [local] idle

216 4077 2937 0 23Dec11 ?? 0:00.42 postgres: caldav caldav [local] idle

216 14096 2937 0 28Dec11 ?? 0:00.43 postgres: caldav caldav [local] idle

216 34078 2937 0 31Dec11 ?? 0:00.16 postgres: caldav caldav [local] idle

216 34079 2937 0 31Dec11 ?? 0:00.10 postgres: caldav caldav [local] idle

1032 21302 20597 0 10:29AM ttys000 0:00.00 grep post


Ocean Digital wrote:


Im not running Postgres. The services I HAD on were OD, VPN and file sharing.


It would be nice if the Server could actually serve something.

Jan 14, 2012 11:50 AM in response to Brian Brumfield

I sure don't have the knowledge you others do but I did run into something else regarding using the Root account on Lion (not server) that may, or may not be of use.


It seems that Lion, and I presume Lion Server also, will allow a user to enable the Root account one time from Directory Utility - Edit. Then login as root, do what is needed and then disable Root. Any subsequent attempts to use Root will fail. It allows you, and will show Root as being anabled but you get the head shake rejection like it is a bad password when you actually try to login as Root.


Working with my Higher Ed Tech Rep, he cmae up with something that not only allowed Root to be logged in but, and more important, kept the ability to enable, disable, enable Root and be able to actually login working normally even after cold restarts, log off and log in and warm restarts. This was done by going in on an Admin account opening Terminal and entering "DSenableroot". Once that was done, all the Root enable, disable and login as Root problems were cleared.


Perhaps you might want to try that just to see if, by some wirld chance, it cleans up the Postgres problem also since that seems to be related to the Root account being accepted as valid with the correct permisioions etc.

Jan 14, 2012 4:42 PM in response to Marco V

@ Marco V:


You'll want to study launchd and services in Mac OS X a little further.

Eg, https://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages /man8/launchd.8.html#//apple_ref/doc/man/8/launchd


When you invoke a command via sudo as admin, you're running that command effectively as root.

From the manpage for sudo:

-u user The -u (user) option causes sudo to run the specified command as a user other
than root.


But if you load a launchd plist, there are options for specifying the user, as we find in com.apple.collabd.plist :

<key>UserName</key>

<string>_teamsserver</string>


If attemping to load a launchd plist doesn't work, then one wants to investigate why,

but if you attempt to invoke the associate service manually, then extra precautions might be necessary.

I'd suggest first checking that the service is running, via


ps -U _teamsserver


or
sudo ps auxww | grep collabd


As for anyone "needing" to do anything as root, I've rarely (next to never) needed to, sudo meeting the need.

For all of you having services failing to start

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.