5 Replies Latest reply: Apr 15, 2013 8:08 AM by trainwreck
Cowan Pettigrew Level 1 Level 1 (0 points)

hi all,

 

I'm running 2 mac mini servers, one a PDC SL 10.6.8 and the other mail/ichat services again 10.6.8.

 

DNS is all configured correctly.

 

Issue: randomly across windows/mac/iphone/ipad mail cleints the end user receives "mail could not be received at this time. the server rejected your username and password etc etc please go have lunch and come back and try again."

 

i cannot find any pattern to this issue. Where for the same user it works on his ipad, it wont auth on his mac.

 

Generally some users will go for weeks without a hitch then randomly they will get auth failed for a small period of time.

 

of note, when this happen to me on my mac, i did a tcpdump and could see that the mail app wasn't even connecting out however i dont think that is the issue generally.

 

I'm resonably skilled in the sysadmin space and yet have being unable to find the cause, esp given its so random.

 

in the logs when a user fails, there is sometime a successful auth entry, other times there isn't, and sometimes although rarer, there is a failed auth and ater 6 tries subsequent account lockout.

 

has anyone else seen this?

 

I was thinking of upgrading to ML for both servers in the vain hope that it would cure the problem but that seems to be a bit like a sledgehammer ( one that might bring other issues)

 

Cheers


macbook pro, Mac OS X (10.6.6)
  • 1. Re: Authentication randomly fails - SL 10.6.8
    Cowan Pettigrew Level 1 Level 1 (0 points)

    Further to this ive found that in the syslog and th mail log, dovecot creates a entry for

     

    Sep 25 14:12:49 mail dovecot[62193]: auth(default): od(bizhub,14.1.36.38): Unable to lookup user record

     

    In the OD logs i can see successful auth's however failed auth's such as the one above do not show up as i suspect that the auth process is getting lost somehow.

     

    I have logging set to debug for imap/smtp

     

    I'm investigating now how dovecot looks up the local user table to see why the connection is failing.

  • 2. Re: Authentication randomly fails - SL 10.6.8
    Cowan Pettigrew Level 1 Level 1 (0 points)

    looking in the dovecot conf file in /etc/dovecot/dovecot.conf i found the following entry

     

    # Maximum number of running mail processes. When this limit is reached,

    # new users aren't allowed to log in.

    # Note:  The "max_mail_processes" key is modified by the Server Admin Mail

    #          plugin.  Please do not manually modifying this line may change

    #          the behavior of the Admin HI

    max_mail_processes = 100

     

    now in server admin for mail i had the max no of imap connections set to 500. this wasn't reflected in the dovecot file as shown above.

     

    The number of imap connections for our server does go over 100 so i suspect that even though the SAdmin showed 500 as the max no i think this dovecot file ignored it.

     

    i altered the dovecot file as root to

     

    # Maximum number of running mail processes. When this limit is reached,

    # new users aren't allowed to log in.

    # Note:  The "max_mail_processes" key is modified by the Server Admin Mail

    #          plugin.  Please do not manually modifying this line may change

    #          the behavior of the Admin HI

    max_mail_processes = 500

     

    this changed the Sadmin max imap connection rate to 2500.

     

    so far we have not had any dovecot errors with - failed to lookup user in table hence we have had no failed auth's

     

    I'll wait for a few days to test the impact of this change.

     

    cheers

  • 3. Re: Authentication randomly fails - SL 10.6.8
    Cowan Pettigrew Level 1 Level 1 (0 points)

    bugger. ok its not the above solution. ill keep looking.

  • 4. Re: Authentication randomly fails - SL 10.6.8
    Cowan Pettigrew Level 1 Level 1 (0 points)

    Hi all,

     

    i thought i'd update my post with what i found to be the answer.

     

    Summary of original issue:

     

    Random account auth failure via dovecot on the mail server. End users would randomly get asked for "please re-enter your password on either mac mail or outlook or thunderbird.

     

    our svr setup is a mac mini running ichat & mail auth'ing to a mac mini running OD both in different subnets.

     

    When the auth failed on the mail svr the logs would show no user found in lookup table...

     

    On the OD master, there were no revelent entires for a failed attempt.

     

    It should be noted that our account lockout policy is 6 tries.

     

    Both server logs running in debug mode. Kerebos and DNS all perfect.

     

    After spending ages looking through dovecot for errors I happened onto this thread.

     

    https://discussions.apple.com/thread/2404563?start=45&tstart=0

     

    Findings:

     

    On the OD master Time Machine runs a script on the hour every hour called prehookbackup or thereabouts. this script runs a command which initiates a OD shutdown via slapd

     

    Oct 10 14:20:31 domainsvr slapd[41442]: daemon: shutdown requested and initiated.

    Oct 10 14:20:31 domainsvr slapd[41442]: slapd shutdown: waiting for 0 threads to terminate

    Oct 10 14:20:32 domainsvr slapd[41442]: slapd stopped.

    Oct 10 14:20:35 domainsvr slapd[43971]: @(#) $OpenLDAP: slapd 2.4.11 (Aug 12 2010 17:17:10) $

    Oct 10 14:20:35 domainsvr slapd[43971]: daemon: SLAP_SOCK_INIT: dtblsize=8192

    Oct 10 14:20:35 domainsvr slapd[43971]: bdb_monitor_db_open: monitoring disabled; configure monitor database to enable

    Oct 10 14:20:35 domainsvr slapd[43971]: slapd starting

     

    The purpose of this is as I understand it so TM can  backup the OD files. i understand that. To prevent corruption it essentailly locks the OD files.

     

    Meanwhile however dovecot which has it's own local user table for mail lookup purposes but not auth detail is still trying to auth to the OD master. It gets a null reply which is technically correct but it's local password lockout script disables the end users account the moment the OD comes back online after TM has finished doing its thing.

     

    hence the random nature of the account lockouts.

     

    Testing so far:

     

    On the OD master we have disabled TM for now. So far we have not had a account failure. This is not a fix however as we need to run a backup. Of note, Page 36+ on the advacned admin server guide fopr SL - apple does not recommend TM for servers. it's purpose is for initaiting TM on connected cleints.

     

    Other inout from other members has also pointed out that spotlight on the server should be disbaled as the mdworker daemon clashes with TM which clashes with OD and causes OD to crash. Check your server crash logs to see if this is the case.

     

    I have not disabled spotlight yet as I'm isolation testing to prove its TM. So far so good though.

     

    I'll update this post in a week following isolation testing of the TM functionality. My plan is after a week if there have being no errors to reinstate TM and see if the errors resurface. I'm juat about 100% sure they will.

     

    This is a great example of 2 peices of software doing exactly what they are supposed to be doing but without talking to each other first.

     

    cheers

  • 5. Re: Authentication randomly fails - SL 10.6.8
    trainwreck Level 1 Level 1 (5 points)

    I am dealing with this issue right now, and was wondering if you've tried excluding items from TM, instead of disabling.