little stops in the Mail Server

Question

Level 1

0 points

little stops in the Mail Server

Hi,

I've noted that, one of my mail server stops for 10 minutes aproximadetly and then return as normal 😟. I dont know why. If I see logs, I see all as normal. It happen in every hour, 10 minutes aprox.

I attach a graph of the CPU and Network and you will see the stops of the server.

http://dathvader.galenics.net/stop.png

I reboot the machine serveral times but nothing happen.

Where is the problem?

If I execute dmesg command, I can see the following:

+or SCSI Domain = 2+
+Discovery condition = 0x00000000+
+Jettisoning kernel linker.+
+Resetting IOCatalogue.+
+Matching service count = 0+
+Matching service count = 0+
+Matching service count = 0+
+Matching service count = 0+
+Matching service count = 0+
+Matching service count = 0+
+Matching service count = 1+
+Xserve2,1: stalling for module+
+Xserve2,1: stalling for module+
+Xserve2,1: stalling for module+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+ACPI SMC_PlatformPlugin::getCPUPSSData - WARNING: _PSS table invalid; ACPI is probably incomplete+
+Previous Shutdown Cause: 5+
+Apple16X50ACPI1: Identified Serial Port on ACPI Device=UAR1+
+Apple16X50ACPI::start FOUND DB9 Property for AAPL,connector+
+Apple16X50UARTSync: Detected 16550AF/C/CF FIFO=16 MaxBaud=115200+
+Matching service count = 1+
AppleTyMCEDriver::probe(Xserve2,1)
+AppleTyMCEDriver::probe fails+
+AppleIntel8254XEthernet: Ethernet address 00:1e:52:f3:72:c0+
+AppleIntel8254XEthernet: Ethernet address 00:1e:52:f3:72:c1+
+Ethernet [Intel8254x]: Link up on en0, 1-Gigabit, Full-duplex, Symmetric flow-control, Debug [792d,af08,0de1,0e00,cde1,3800]+
+Ethernet [Intel8254x]: Link up on en1, 1-Gigabit, Full-duplex, No flow-control, Debug [792d,ac48,0de1,0e00,c1e1,7800]+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+ALF ALERT: sockwall cntlupdaterules ctl_enqueuedata rts err 55+
+serialnumberd 434 FS WRITEDATA SBF /dev/dtracehelper 13 (seatbelt)+
+serialnumberd 434 FS READDATA SBF /dev/autofs_nowait 13 (seatbelt)+
+serialnumberd 434 FS READDATA SBF /usr/sbin 13 (seatbelt)+
+ALF: ifnet get_address_listfamily error 12+
+Limiting closed port RST response from 276 to 250 packets per second+
+Limiting closed port RST response from 293 to 250 packets per second+
+Limiting closed port RST response from 263 to 250 packets per second+
+Limiting closed port RST response from 277 to 250 packets per second+
+Limiting closed port RST response from 262 to 250 packets per second+
+Limiting closed port RST response from 273 to 250 packets per second+

Any Clue?

Thanks for all.

Posted on May 11, 2010 1:59 AM

Reply

Answer 1

pterobyte

Level 6

11,098 points

May 11, 2010 3:55 AM in response to jonflas

Check which processes are taking up CPU time during slowdowns.

Reply

Answer 2

jonflas Author

Level 1

0 points

May 11, 2010 10:58 AM in response to pterobyte

Hi,

Thanks for the clue pterobyte 😉.

Well....I put a simple script in the crontab each 5 minutes:

*ps aux | mail -s "Lista procesos" unidadtecnologica@cgcom.es*

The email before slowdown was the following:

+USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND+
+_amavisd 40862 40.8 2.1 772708 173592 ?? R 4:53PM 10:58.63 find /var/virusmails -type f -atime +30 -exec rm -r {} ;+
+_cyrus 46659 5.8 0.0 927936 1716 s003 S+ 5:24PM 0:00.11 /usr/bin/cyrus/bin/reconstruct -r -f user/perezlledo comaes+
+root 46674 3.2 0.0 87428 2172 ?? R 5:25PM 0:00.03 serveradmin settings mail:mailman:default emailhost+
+root 46663 1.8 0.0 600172 592 ?? Ss 5:25PM 0:00.01 /bin/bash /var/root/scripts/cleanlmtpd+
+root 46665 1.1 0.0 600172 592 ?? Ss 5:25PM 0:00.00 /bin/sh -c /var/root/scripts/recipient accessscript;/usr/sbin/postmap /mailstore/recipient_access+
+root 46664 1.1 0.0 600172 580 ?? Ss 5:25PM 0:00.00 /bin/sh -c ps aux | mail -s "lista procesos" unidadtecnologica@cgcom.es+
+root 46667 0.9 0.0 600172 612 ?? S 5:25PM 0:00.00 /bin/bash /var/root/scripts/recipient accessscript+
+root 46672 0.7 0.0 599740 376 ?? S 5:25PM 0:00.00 grep lmtpd+
+root 46670 0.7 0.0 600048 436 ?? R 5:25PM 0:00.00 ps aux+
+root 46666 0.6 0.0 599784 432 ?? R 5:25PM 0:00.00 ps aux+
+root 46673 0.6 0.0 591428 284 ?? S 5:25PM 0:00.00 wc -l+
+root 46675 0.5 0.0 599660 320 ?? S 5:25PM 0:00.00 sed -e s/[^"] "([^"]*)./\1/+
+root 46669 0.4 0.0 600172 384 ?? S 5:25PM 0:00.00 /bin/bash /var/root/scripts/cleanlmtpd+
+root 46671 0.4 0.0 600172 380 ?? S 5:25PM 0:00.00 /bin/bash /var/root/scripts/recipient accessscript+
+root 46436 0.4 0.0 601332 480 ?? Ss Mon05PM 4:38.60 /usr/sbin/syslogd+
+george 2298 0.4 0.1 232188 12512 ?? S Mon02PM 3:23.58 /Users/george/Desktop/Terminal.app/Contents/MacOS/Terminal -psn 053261+
+root 113 0.3 0.0 77076 2968 ?? Ss Mon01PM 12:42.89 hwmon+d
...
...

Like you can see, the process with more CPU usage is amavisd. Something normal I think because is processing and analysing mails ...

Well... Now, In the following line the moment of the slowdown:

+USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND+
+root 46436 11.0 0.0 601336 484 ?? Us Mon05PM 4:39.09 /usr/sbin/syslogd+
+root 47077 7.7 0.0 599948 596 ?? U 5:31PM 0:00.04 ps aux+
+root 47128 7.4 0.0 599944 572 ?? U 5:35PM 0:00.03 ps aux+
+root 47132 7.4 0.0 599944 572 ?? R 5:35PM 0:00.03 ps aux+
+root 47091 7.2 0.0 599944 584 ?? U 5:32PM 0:00.04 ps aux+
+root 47116 7.2 0.0 600200 576 ?? U 5:34PM 0:00.03 ps aux+
+root 47058 7.1 0.0 599948 600 ?? U 5:30PM 0:00.05 ps aux+
+root 47038 7.0 0.0 599948 604 ?? U 5:29PM 0:00.05 ps aux+
+root 47063 6.8 0.0 600204 592 ?? U 5:30PM 0:00.04 ps aux+
+root 47104 6.6 0.0 600200 572 ?? R 5:33PM 0:00.03 ps aux+
+root 113 6.6 0.0 77076 2976 ?? Ss Mon01PM 12:50.83 hwmond+
+root 47193 3.0 0.0 599944 552 ?? R 5:40PM 0:00.02 ps aux+
+root 86 2.6 0.2 615524 18372 ?? Us Mon01PM 1:10.51 /usr/sbin/named -f+
+george 47045 2.5 0.0 77404 936 ?? U 5:29PM 0:00.03 /usr/bin/nmblookup HANSOLO+
+root 47197 2.5 0.0 599944 548 ?? U 5:40PM 0:00.02 ps aux+
+root 47181 2.4 0.0 600200 548 ?? U 5:39PM 0:00.02 ps aux+
+george 47097 2.4 0.0 77404 928 ?? U 5:32PM 0:00.03 /usr/bin/nmblookup -M workgroup+
+root 47210 2.4 0.0 599944 544 ?? U 5:41PM 0:00.01 ps aux+
+root 47156 2.3 0.0 599944 560 ?? R 5:37PM 0:00.02 ps aux+
+root 47144 2.1 0.0 599944 564 ?? U 5:36PM 0:00.03 ps aux+
+root 47168 2.0 0.0 599944 552 ?? R 5:38PM 0:00.02 ps aux+
+root 47120 1.6 0.0 87124 1236 ?? U 5:34PM 0:00.01 serveradmin settings mail:mailman:default emailhost+
+root 47081 1.6 0.0 87124 1236 ?? U 5:31PM 0:00.01 serveradmin settings mail:mailman:default emailhost+
+root 47095 1.5 0.0 87124 1236 ?? U 5:32PM 0:00.01 serveradmin settings mail:mailman:default emailhost+
...
...

You can see syslogd process using a 11% of the CPU usage. The rest of the process are in 0% CPU usage!!!!.

Another ps aux in the slowdown time:

+USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND+
+root 46436 11.0 0.0 601336 484 ?? Rs Mon05PM 4:39.09 /usr/sbin/syslogd+
+root 47038 10.6 0.0 600204 612 ?? R 5:29PM 0:00.05 ps aux+
+root 47058 10.6 0.0 600204 608 ?? U 5:30PM 0:00.05 ps aux+
+root 47077 7.7 0.0 599948 600 ?? R 5:31PM 0:00.05 ps aux+
+root 47132 7.4 0.0 599944 576 ?? U 5:35PM 0:00.03 ps aux+
+root 47128 7.4 0.0 599944 576 ?? R 5:35PM 0:00.03 ps aux+
+root 47116 7.2 0.0 599944 580 ?? U 5:34PM 0:00.04 ps aux+
+root 47091 7.2 0.0 599948 596 ?? U 5:32PM 0:00.04 ps aux+
+root 47063 6.8 0.0 599948 600 ?? U 5:30PM 0:00.05 ps aux+
+root 113 6.6 0.0 77076 2976 ?? Ss Mon01PM 12:50.83 hwmond+
+root 47104 6.6 0.0 599944 580 ?? U 5:33PM 0:00.04 ps aux+
+root 47193 3.0 0.0 599944 560 ?? U 5:40PM 0:00.02 ps aux+
+root 47197 2.5 0.0 599944 548 ?? U 5:40PM 0:00.02 ps aux+
+root 47181 2.4 0.0 600200 552 ?? U 5:39PM 0:00.02 ps aux+
+root 47210 2.4 0.0 599944 544 ?? R 5:41PM 0:00.01 ps aux+
+george 47097 2.4 0.0 77404 928 ?? U 5:32PM 0:00.03 /usr/bin/nmblookup -M workgroup+
...
...

Again, syslogd is the process with 11% CPU Usage. The rest of the process with 0% of CPU.

Now, just the moment after the slowdown:

+USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND+
+root 82 24.4 1.4 345360 119536 ?? Ss Mon01PM 11:26.87 /usr/libexec/slapd -d 0 -h ldap:/// ldapi://%2Fvar%2Frun%2Fldapi+
+root 47558 6.9 0.0 77380 1576 ?? R 5:44PM 0:00.14 ldapsearch -x -b dc=dathvader,dc=galenics,dc=net uid= @ uid+
+root 47531 4.9 0.0 77380 1576 ?? R 5:44PM 0:00.22 ldapsearch -x -b dc=dathvader,dc=galenics,dc=net uid= @ uid+
+root 47557 4.5 0.0 603156 3720 ?? Us 5:44PM 0:00.08 /System/Library/Frameworks/Python.framework/Versions/2.5/Resources/Python.app/C ontents/MacOS/Python /usr/share/mailman/bin/get listinfo -o /tmp/smdm4.lLzKHkzftakVka+
+_www 47334 3.3 0.1 2716700 4836 ?? S 5:44PM 0:00.25 /usr/sbin/httpd -D FOREGROUND+
+_cyrus 47572 3.1 0.0 928980 1588 ?? U 5:45PM 0:00.01 pop3d+
+_www 47390 1.8 0.1 2723356 10852 ?? S 5:44PM 0:00.29 /usr/sbin/httpd -D FOREGROUND+
+_amavisd 47305 1.8 0.0 599640 344 ?? U 5:44PM 0:00.20 find /var/amavis/tmp -type d -atime +30 -exec rm -r {} ;+
+_cyrus 47048 1.7 0.0 922960 1160 ?? U 5:29PM 0:00.05 ctl_cyrusdb -c+
+_cyrus 47567 1.7 0.0 928980 1588 ?? U 5:45PM 0:00.01 pop3d+
+root 47036 1.3 0.0 600304 824 ?? U 5:29PM 0:00.02 /bin/bash /var/root/scripts/recipient accessscript+
+_cyrus 43234 1.3 0.0 929676 2272 ?? U 4:59PM 0:00.67 imapd: localhost [127.0.0.1] fmartinez ffomcorg+
+_cyrus 41019 1.2 0.0 929500 2264 ?? U 4:53PM 0:00.69 imapd: localhost [127.0.0.1] miguelcorty comaes user.miguelcorty_co+
+_amavisd 44902 1.0 0.4 650920 37088 ?? U 5:10PM 0:05.32 amavisd (ch18-44902-18)+
...
...

The situation returns as normal ...

Do you see anything?

Maybe the problem is in syslog daemon?

Thanks for you help and your time!!

Bye!!!!!

Reply

Answer 3

pterobyte

Level 6

11,098 points

May 11, 2010 11:38 AM in response to jonflas

http://discussions.apple.com/thread.jspa?messageID=9469335&#9469335

Reply

Answer 4

jonflas Author

Level 1

0 points

May 12, 2010 6:39 AM in response to pterobyte

hi,

you are the machine, pterobyte.

Like u said, the reason of the slowdown was because the amount of junk Mail in /var/virusmails.

We had recently a spam attack and we have many junk emails in that directory, aproximadetly 80000 mails.

Again, thanks for ur clues and help.

Bye!!!

Reply