Lynda Leung

Q: Mavericks mail server stops distributing group email after a few hours of usage

I've been having problems with our Mavericks mail server ever since our upgrade with group mail.  

 

The server will distribute group email for a few hour, then it simply stops working. 
No error message, no bounces, and the server keeps the incoming mail in its incoming queue.  

 

Restarting the mail services by either server.app or serveradmin mail stop; serveradmin mail start command like  are the only way I can get it to continue to distribute the email.  I suspect it has to do with postfix itself hiccuping?  But this is a reproducable problem.

 

Mail filtering is on, score set to 6, 10 mb limit, spamhaus is active.

 

Looking for tips on how to stablize this service short of setting up cron job to restart the services every few hours.

Mac mini, OS X Mavericks (10.9), OpenLDAP and Server.app user

Posted on Nov 1, 2013 10:43 AM

Close

Q: Mavericks mail server stops distributing group email after a few hours of usage

  • All replies
  • Helpful answers

first Previous Page 3 of 3
  • by FromOZ,

    FromOZ FromOZ Apr 6, 2014 12:54 AM in response to curriebwoi
    Level 3 (545 points)
    Apr 6, 2014 12:54 AM in response to curriebwoi

    curriebwoi wrote:

     

    I am having a similar problem, where can I locate this crontab script?

     

    See Lynda's post earlier

     

    This is how I worked around it.  create a crontab as root to reset the service every 6 hours.  Best if you use a number divisible by 24 for obvious reasons.

     

     

    use this single line entry:
    0 */6 * * * /Applications/Server.app/Contents/ServerRoot/usr/sbin/serveradmin stop mail >> /dev/null ;  /Applications/Server.app/Contents/ServerRoot/usr/sbin/serveradmin start mail >> /dev/null

  • by Joey Appleseed,

    Joey Appleseed Joey Appleseed Apr 16, 2014 12:21 PM in response to Lynda Leung
    Level 1 (0 points)
    Apr 16, 2014 12:21 PM in response to Lynda Leung

    We experienced the same issue. Upon further review (and after much troubleshooting) the process that appears to be causing the headaches is the "list_server_mgr" process that spawns the agents responsible for delivering the messages. The quick and dirty solution to this (until it is fixed in Server.app) is to just run a script that executes: "killall list_server_mgr" at some interval you may determine, which will re-initiate that managing service to spawn the processes that deliver the messages. This is a less heavy-handed approach to offlining the entire Mail environment and then bringing it back up again.

     

    Here's hoping for a fix well before Server.app v3.2.

  • by Miggl,

    Miggl Miggl Apr 16, 2014 12:29 PM in response to Joey Appleseed
    Level 1 (77 points)
    Apr 16, 2014 12:29 PM in response to Joey Appleseed

    Thank you for this further insight! According to your recommendation, I have updated my AppleScript to the following:

     

    repeat

              do shell script "killall list_server_mgr" password "<server admin pwd>" with administrator privileges

              delay 900

    end repeat

     

     

    This will run every 15 minutes.

  • by Miggl,

    Miggl Miggl Apr 17, 2014 10:28 AM in response to Miggl
    Level 1 (77 points)
    Apr 17, 2014 10:28 AM in response to Miggl

    This ended up not working for me, so I reverted back to the original "reset" script.

  • by Miggl,

    Miggl Miggl Apr 17, 2014 11:52 AM in response to Miggl
    Level 1 (77 points)
    Apr 17, 2014 11:52 AM in response to Miggl

    Joey Appleseed's report caused me to look a bit more into the list_server_mgr aspect of things. My List Server Log shows the following:

     

     

    Apr 17 11:44:58 genealabs.com list_server_post[11758] <Info>: message: 1397760298.343242.DED78BC8F0360958.msg from: xxxxxxx@gmail.com posted to: info (size=11639)

    Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: list agent wake from sleeping

    Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: connect to: 127.0.0.1 [port 25]

    Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Error>: SMTP write returned error: (null)

    Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Error>: write command: "

              .

              " failed

    Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: disconnect to: 127.0.0.1

    Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Error>: message deliver error: 1396990516.086732.3583944666D7530B.msg to list: info

    Apr 17 11:45:01 genealabs.com list_server_agent[10789] <Info>: list agent sleeping for 300 seconds

     

    This tells me there is something wrong in this area, but I'm not sure where to go from here. I have tested that port 25 is live and well using telnet 127.0.0.1 25, as well as testing the connection from outside of my network.

     

    Any suggestions how to troubleshoot this error? I've googled for it, but haven't seen anything posted on it.

  • by Dustin Wenz,

    Dustin Wenz Dustin Wenz Apr 25, 2014 8:41 AM in response to Joey Appleseed
    Level 2 (215 points)
    Apr 25, 2014 8:41 AM in response to Joey Appleseed

    It appears that list_server_mgr spawns some number of list_server_agent processes when there are group messages to be sent out. These agents stick around if there is always some mailing list activity. However, if there is a dearth of messages for a few minutes, the agents terminate automatically.

     

     

    list_server_mgr creates two pipes to communicate with the agent processes. In my case, when the agents terminate, the pipes in list_server_mgr are not closed, and become orphaned in the FD list (you can observe this with lsof). After 11 agent processes have exited, all mailing list activity stops. When mailing list activity is dead, try running this command, and count the number of PIPE lines:

    lsof -p `ps -A | grep 'list_server_mgr' | grep -v grep | awk '{print $1}'`

     

    If I see 22 of those, and if all of the agents have exited, I think that is time to kick the mailing list process. I've created a script called listkicker.sh on my server, and run it as root in the background like this:

    sudo ./listkicker.sh &

     

    The code for the script is as follows:

     

    #!/bin/bash

     

    echo "Starting mailing list kicker with process ID $$" | logger

     

    while [ 1 ]

    do

              mgr_pid=`ps -A | grep 'list_server_mgr' | grep -v grep | awk '{print $1}'`

              numpipes=`lsof -p ${mgr_pid} | grep PIPE | wc -l | awk '{print $1}'`

     

              if [ $numpipes -gt 21 ]

              then

                        numagents=`ps -A | grep -v grep | grep list_server_agent | wc -l | awk '{print $1}'`

     

                        if [ $numagents -lt 1 ]

                        then

                                  echo "Restarting list manager due to stupid bug." | logger

                                  killall list_server_mgr

                        else

                                  echo "Not restarting list manager with ${numagents} agent(s) running." | logger

                        fi

              fi

     

              sleep 60

    done

  • by Miggl,

    Miggl Miggl Apr 25, 2014 8:46 AM in response to Dustin Wenz
    Level 1 (77 points)
    Apr 25, 2014 8:46 AM in response to Dustin Wenz

    Thanks, Dustin, this might come in helpful. In my case, I have found that certain incoming emails can hold up the queue, causing no further messages to be delivered. Removing the offending messages one at a time from the inbox queue frees things up. I have sent a sample of these messages to Apple's Server Engineering team, and they said they are working on the issue and have acknowledged the problem.

     

    I hope this includes all issues listed in this thread, as I'm guessing we are seeing more than one issue here.

     

    As long as I keep the inbox clog-free I am receiving list emails and don't have to run any scripts.

     

    I am monitoring the following folder for "clogs":

    • /Library/Server/Mail/Data/listserver/messages/inbound/<list guid>/

     

    And keeping an eye on these folders:

    • /Library/Server/Mail/Data/listserver/messages/hold/
    • /Library/Server/Mail/Data/listserver/messages/error/

     

    The emails I do need to remove to unclog the system I put in an bad_emails folder on my desktop, and send that via the Server > Provide Server Feedback ... menu item to the engineering team.

  • by Dustin Wenz,

    Dustin Wenz Dustin Wenz Apr 25, 2014 8:54 AM in response to Miggl
    Level 2 (215 points)
    Apr 25, 2014 8:54 AM in response to Miggl

    This is very interesting... I wonder if you are experiencing a different problem than we are?

     

    Have you ever tried removing one of the bad emails, and then dropping it back into the queue?

     

    I've reported my issue to bugreport.apple.com, but I haven't had any response to it.

  • by Miggl,

    Miggl Miggl Apr 25, 2014 8:58 AM in response to Dustin Wenz
    Level 1 (77 points)
    Apr 25, 2014 8:58 AM in response to Dustin Wenz

    Yes, I did. And the behavior was always the same. I'm convinced that the mail server is unable to parse a certain type of email format, as I probably have 20 or so of these bad boys now accumulated.

     

    Try sending to: OS-X-Server-Feedback@group.apple.com, which is the server's bug report email.

  • by Lynda Leung,Solvedanswer

    Lynda Leung Lynda Leung Jun 1, 2014 10:19 PM in response to Lynda Leung
    Level 1 (0 points)
    Jun 1, 2014 10:19 PM in response to Lynda Leung

    Apparently Server version 3.1.2 had fixed this issue. 
    I've been running it for a couple of days. without my crontab script.  Group mail are still delivering.
    Also, old messages that has been caught in the queue are being flooded to the recipiants.   So that's something to consider and warn your groups about when you do this update

     

    Anyways, case closed, and about freakin' time.

  • by MMimsGTKey,

    MMimsGTKey MMimsGTKey Sep 17, 2014 5:01 PM in response to Lynda Leung
    Level 1 (0 points)
    Sep 17, 2014 5:01 PM in response to Lynda Leung

    I just updated to 10.9.4 from 10.7.5 and have server app 3.1.2 installed and running, and i'm still having the issue. its really becoming a problem

first Previous Page 3 of 3