Apple Event: May 7th at 7 am PT

Newsroom Update

Beginning in May, a special Today at Apple series titled “Made for Business” will offer small business owners and entrepreneurs free opportunities to learn how Apple products and services can support their growth and success. Learn more >

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Lynda Leung Author

Level 1

0 points

Mavericks mail server stops distributing group email after a few hours of usage

I've been having problems with our Mavericks mail server ever since our upgrade with group mail.

The server will distribute group email for a few hour, then it simply stops working.
No error message, no bounces, and the server keeps the incoming mail in its incoming queue.

Restarting the mail services by either server.app or serveradmin mail stop; serveradmin mail start command like are the only way I can get it to continue to distribute the email. I suspect it has to do with postfix itself hiccuping? But this is a reproducable problem.

Mail filtering is on, score set to 6, 10 mb limit, spamhaus is active.

Looking for tips on how to stablize this service short of setting up cron job to restart the services every few hours.

Mac mini, OS X Mavericks (10.9), OpenLDAP and Server.app user

Posted on Nov 1, 2013 10:43 AM

40 replies

Apr 16, 2014 12:21 PM in response to Lynda Leung

We experienced the same issue. Upon further review (and after much troubleshooting) the process that appears to be causing the headaches is the "list_server_mgr" process that spawns the agents responsible for delivering the messages. The quick and dirty solution to this (until it is fixed in Server.app) is to just run a script that executes: "killall list_server_mgr" at some interval you may determine, which will re-initiate that managing service to spawn the processes that deliver the messages. This is a less heavy-handed approach to offlining the entire Mail environment and then bringing it back up again.

Here's hoping for a fix well before Server.app v3.2.

Miggl

Level 1

89 points

Apr 16, 2014 12:29 PM in response to Joey Appleseed

Thank you for this further insight! According to your recommendation, I have updated my AppleScript to the following:

repeat

do shell script "killall list_server_mgr" password "<server admin pwd>" with administrator privileges

delay 900

end repeat

This will run every 15 minutes.

Miggl

Level 1

89 points

Apr 17, 2014 10:28 AM in response to Miggl

This ended up not working for me, so I reverted back to the original "reset" script.

Miggl

Level 1

89 points

Apr 17, 2014 11:52 AM in response to Miggl

Joey Appleseed's report caused me to look a bit more into the list_server_mgr aspect of things. My List Server Log shows the following:

Apr 17 11:44:58 genealabs.com list_server_post[11758] <Info>: message: 1397760298.343242.DED78BC8F0360958.msg from: xxxxxxx@gmail.com posted to: info (size=11639)

Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: list agent wake from sleeping

Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: connect to: 127.0.0.1 [port 25]

Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Error>: SMTP write returned error: (null)

Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Error>: write command: "

.

" failed

Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: disconnect to: 127.0.0.1

Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Error>: message deliver error: 1396990516.086732.3583944666D7530B.msg to list: info

Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: list agent sleeping for 300 seconds

This tells me there is something wrong in this area, but I'm not sure where to go from here. I have tested that port 25 is live and well using telnet 127.0.0.1 25, as well as testing the connection from outside of my network.

Any suggestions how to troubleshoot this error? I've googled for it, but haven't seen anything posted on it.

Apr 25, 2014 8:41 AM in response to Joey Appleseed

It appears that list_server_mgr spawns some number of list_server_agent processes when there are group messages to be sent out. These agents stick around if there is always some mailing list activity. However, if there is a dearth of messages for a few minutes, the agents terminate automatically.

list_server_mgr creates two pipes to communicate with the agent processes. In my case, when the agents terminate, the pipes in list_server_mgr are not closed, and become orphaned in the FD list (you can observe this with lsof). After 11 agent processes have exited, all mailing list activity stops. When mailing list activity is dead, try running this command, and count the number of PIPE lines:

lsof -p `ps -A | grep 'list_server_mgr' | grep -v grep | awk '{print $1}'`

If I see 22 of those, and if all of the agents have exited, I think that is time to kick the mailing list process. I've created a script called listkicker.sh on my server, and run it as root in the background like this:

sudo ./listkicker.sh &

The code for the script is as follows:

#!/bin/bash

echo "Starting mailing list kicker with process ID $$" | logger

while [ 1 ]

mgr_pid=`ps -A | grep 'list_server_mgr' | grep -v grep | awk '{print $1}'`

numpipes=`lsof -p ${mgr_pid} | grep PIPE | wc -l | awk '{print $1}'`

if [ $numpipes -gt 21 ]

then

numagents=`ps -A | grep -v grep | grep list_server_agent | wc -l | awk '{print $1}'`

if [ $numagents -lt 1 ]

then

echo "Restarting list manager due to stupid bug." | logger

killall list_server_mgr

else

echo "Not restarting list manager with ${numagents} agent(s) running." | logger

sleep 60

done

Miggl

Level 1

89 points

Apr 25, 2014 8:46 AM in response to Dustin Wenz

Thanks, Dustin, this might come in helpful. In my case, I have found that certain incoming emails can hold up the queue, causing no further messages to be delivered. Removing the offending messages one at a time from the inbox queue frees things up. I have sent a sample of these messages to Apple's Server Engineering team, and they said they are working on the issue and have acknowledged the problem.

I hope this includes all issues listed in this thread, as I'm guessing we are seeing more than one issue here.

As long as I keep the inbox clog-free I am receiving list emails and don't have to run any scripts.

I am monitoring the following folder for "clogs":

/Library/Server/Mail/Data/listserver/messages/inbound/<list guid>/

And keeping an eye on these folders:

/Library/Server/Mail/Data/listserver/messages/hold/
/Library/Server/Mail/Data/listserver/messages/error/

The emails I do need to remove to unclog the system I put in an bad_emails folder on my desktop, and send that via the Server > Provide Server Feedback ... menu item to the engineering team.

Apr 25, 2014 8:54 AM in response to Miggl

This is very interesting... I wonder if you are experiencing a different problem than we are?

Have you ever tried removing one of the bad emails, and then dropping it back into the queue?

I've reported my issue to bugreport.apple.com, but I haven't had any response to it.

Miggl

Level 1

89 points

Apr 25, 2014 8:58 AM in response to Dustin Wenz

Yes, I did. And the behavior was always the same. I'm convinced that the mail server is unable to parse a certain type of email format, as I probably have 20 or so of these bad boys now accumulated.

Try sending to: OS-X-Server-Feedback@group.apple.com, which is the server's bug report email.

Lynda Leung Author

Level 1

0 points

Jun 1, 2014 10:19 PM in response to Lynda Leung

Apparently Server version 3.1.2 had fixed this issue.
I've been running it for a couple of days. without my crontab script. Group mail are still delivering.
Also, old messages that has been caught in the queue are being flooded to the recipiants. So that's something to consider and warn your groups about when you do this update

Anyways, case closed, and about freakin' time.

Sep 17, 2014 5:01 PM in response to Lynda Leung

I just updated to 10.9.4 from 10.7.5 and have server app 3.1.2 installed and running, and i'm still having the issue. its really becoming a problem 😟

Mavericks mail server stops distributing group email after a few hours of usage