Sakshale2

Q: launchd keeps respawning script job

OS 10.10.5 Yosemite

 

I have a simply plist to run a script once per day at 3:35AM.  Unfortunately, as soon as the script completes, it gets restarted, over and over again. Since this is an administrative task, to be run by root, the plist is in /Library/LaunchDemons/.

 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.rukbat.neptune</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/sh</string> 
        <string>/usr/local/etc/neptune-backup.sh</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>3</integer>
        <key>Minute</key>
        <integer>35</integer>
    </dict>
</dict>
</plist> 

 

I've tried adding / deleting multiple different entries for Disabled, KeepAlive, StartInterval, you name it.  No matter what I do, as soon as I "load" the plist with LaunchCtr, the script starts running.  If I manually kill the process, it simply restarts.  What is cute is that I am copying a 4 gig file over a network connection.  As soon as the copy completes, the script restarts, zeroing the file and copying it again.  I can do all sorts of tricks, like checking for the existence of the file, to work around this problem.  However, I would rather know what I am doing wrong with LaunchD.

MacBook Air, OS X Yosemite (10.10.5)

Posted on Apr 17, 2016 1:17 PM

Close

Q: launchd keeps respawning script job

  • All replies
  • Helpful answers

  • by Linc Davis,Helpful

    Linc Davis Linc Davis Apr 18, 2016 5:21 AM in response to Sakshale2
    Level 10 (207,926 points)
    Applications
    Apr 18, 2016 5:21 AM in response to Sakshale2

    I suspect that your script forks into the background, which would mean that launchd could not tell whether it was running or not. See the launchd.plist(5) man page.

  • by etresoft,Helpful

    etresoft etresoft Apr 18, 2016 4:34 AM in response to Sakshale2
    Level 7 (29,056 points)
    Apr 18, 2016 4:34 AM in response to Sakshale2

    Hello Sakshale2,

    I suggest not starting with a root launchd script that does something like this. Instead, create one that has similar program arguments but calls a shell script that just does something like "/bin/date > /tmp/date.txt". That will be much easier to debug and test.

  • by Sakshale2,

    Sakshale2 Sakshale2 Apr 18, 2016 5:19 AM in response to etresoft
    Level 1 (4 points)
    Mac OS X
    Apr 18, 2016 5:19 AM in response to etresoft

    Interesting -- I would say Launchd is broken in comparison to CRON.

     

    Here is the action part of the script;

    if [ -f ${BKP_FILE} ]; then
         # cp ${BKP_FILE} ${TAR_DIR}
         # rsync ${BKP_FILE} ${TAR_DIR}
         echo The Script Made it to here!
    else
         echo ${BKP_FILE} is missing
         exit
    fi
    

    As soon as I followed your recommendation and directly the script to simplly use the echo, it stopped re-spawning.

     

    I did not do anything to the plist file in /Library/LaunchDaemons.  I simply edited the script in /usr/local/etc.  When I changed it back to the rsync line, launchd immediately restarted the process.  No "load" required!

     

    bash-3.2# !ps
    ps auxww |grep neptune
    root             8858  30.7  0.0  2452248   1020   ??  RN    4:52AM   0:15.44 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             8860  22.0  0.0  2451992    664   ??  SN    4:52AM   0:11.06 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             8878   0.0  0.0  2432772    644 s000  S+    4:53AM   0:00.00 grep neptune
    root             8859   0.0  0.0  2451736    444   ??  SN    4:52AM   0:00.00 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             8854   0.0  0.0  2444636    976   ??  SNs   4:52AM   0:00.00 /bin/sh /usr/local/etc/neptune-backup.sh
    bash-3.2# 
    

    I then directly edited the script file and changed it to use the cp command, commenting out the rsync line.  I re-ran the ps immediately upon exiting vi and low and behold -- without sending any signal to the launchd, it had restarted the cp, because the rsync completed.  Going in the opposite direction took longer, as the cp had to completely copy a 4.7G file over the network.

    bash-3.2# !ps
    ps auxww |grep neptune
    root             8925  14.5  0.0  2433788   1552   ??  UN    4:55AM   0:00.48 cp /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             8929   0.0  0.0  2423356    200 s000  R+    4:55AM   0:00.00 grep neptune
    root             8921   0.0  0.0  2444636    980   ??  SNs   4:55AM   0:00.00 /bin/sh /usr/local/etc/neptune-backup.sh
    bash-3.2# 
    

    Reading through man 5 launchd.plist, running cp or rsync appear to be acting as "the  moral equivalent of daemon(3) by calling fork(2) and have the parent process exit(3) or _exit(2)."  ***! If launchd cannot run a simply command from a shell script, then 90% of my traditional CRON scripts are useless!  After all, the primary script doesn't "do" much more than check system status, setup the calling variables and run a command. 


    For example, this script was 37 lines of setup, error checking and documentation, to run this one command;

    <pre>

    cp ${BKP_FILE} ${TAR_DIR}

    </pre>

     

    For the record;  It's twin brothers have been running on Linux and Solaris boxes for years, using cron.

  • by Sakshale2,

    Sakshale2 Sakshale2 Apr 18, 2016 5:50 AM in response to Linc Davis
    Level 1 (4 points)
    Mac OS X
    Apr 18, 2016 5:50 AM in response to Linc Davis

    The plist states that this is a command to run once per day at 3:35 AM.

         StartCalendarInterval               Hour          3          Minute          35     

    At 5:323 AM it has a record of process 9651 running.

    bash-3.2# date
    Mon Apr 18 05:23:41 PDT 2016
    bash-3.2# launchctl list|grep ruk
    9651       0     com.rukbat.neptune
    bash-3.2# ps auxww |grep neptune
    root             9655  28.6  0.0  2443032   1008   ??  SN    5:23AM   0:16.49 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             9657  27.0  0.0  2451992    680   ??  UN    5:23AM   0:13.96 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             9680   0.0  0.0  2432772    644 s000  S+    5:24AM   0:00.00 grep neptune
    root             9656   0.0  0.0  2451736    460   ??  SN    5:23AM   0:00.00 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             9651   0.0  0.0  2444636    984   ??  SNs   5:23AM   0:00.00 /bin/sh /usr/local/etc/neptune-backup.sh
    bash-3.2#
    

    Once process 9651 completes normally, why would the system restart the process?   It should wait until tomorrow.  Here is what I see now;

    bash-3.2# launchctl list|grep ruk
    9831      0     com.rukbat.neptune
    bash-3.2# ps auxww |grep neptune
    root             9835  26.4  0.0  2443032   1008   ??  SN    5:30AM   0:13.57 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             9837  25.9  0.0  2442776    676   ??  UN    5:30AM   0:11.25 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             9856   0.0  0.0  2441988    660 s000  S+    5:30AM   0:00.00 grep neptune
    root             9836   0.0  0.0  2442520    452   ??  SN    5:30AM   0:00.00 rsync /neptune/Monday-wikibackup.tar.gz /usr/local/etc/neptune
    root             9831   0.0  0.0  2444636    980   ??  SNs   5:30AM   0:00.00 /bin/sh /usr/local/etc/neptune-backup.sh
    bash-3.2#
    

     

    For the record;  I added the /bin/sh as part of my testing.  Originally I only had the script listed, but thought adding the /bin/sh might stop the restarting -- no such luck.  The shell process 9831 will not complete/exit until the rsync processes finish.  At that time, it exits and Launchd spawns a replacement.

     

    For the record;  The file copy is always successful.

     

    bash-3.2# tail /var/log/neptune.log
    Database / Wiki Backup completed on 05:32
    Database / Wiki Backup completed on 05:34
    Database / Wiki Backup completed on 05:35
    Database / Wiki Backup completed on 05:37
    Database / Wiki Backup completed on 05:38
    Database / Wiki Backup completed on 05:39
    Database / Wiki Backup completed on 05:41
    Database / Wiki Backup completed on 05:42
    Database / Wiki Backup completed on 05:44
    Database / Wiki Backup completed on 05:45
    bash-3.2# 
    
  • by Linc Davis,

    Linc Davis Linc Davis Apr 18, 2016 6:55 AM in response to Sakshale2
    Level 10 (207,926 points)
    Applications
    Apr 18, 2016 6:55 AM in response to Sakshale2

    Try setting "KeepAlive" to false.

  • by Sakshale2,

    Sakshale2 Sakshale2 Apr 18, 2016 7:59 AM in response to Linc Davis
    Level 1 (4 points)
    Mac OS X
    Apr 18, 2016 7:59 AM in response to Linc Davis

    Unfortunately, my experiments with Disabled, KeepAlive, StartInterval, and many other things yesterday, had no impact.

     

    For one thing, the "default' for KeepAlive is supposed to be false.

     

    KeepAlive <boolean or dictionary of stuff>

         This optional key is used to control whether your job is to be kept con-

         tinuously running or to let demand and conditions control the invocation.

         The default is false and therefore only demand will start the job.

     

    However, yesterday I ran most of my tests with KeepAlive=False set.  I stripped it and a bunch of other stuff from the plist for posting.

    <key>Disabled</key>

        <true/>

        <key>KeepAlive</key>

        <false/>

        <key>Label</key>

        <string>com.rukbat.neptune</string>

        <key>Nice</key>

        <integer>1</integer>

        <key>OnDemand</key>

        <false/>

     

     

  • by etresoft,

    etresoft etresoft Apr 18, 2016 8:00 AM in response to Sakshale2
    Level 7 (29,056 points)
    Apr 18, 2016 8:00 AM in response to Sakshale2

    Hello again Sakshale2,

    Launchd is different than cron, but the process of learning it is identical. Start in user space. Start by echoing something to a text file. Only when you are confident that you can control that behaviour do you move up to some real task or use root. I see no evidence that you have done that. Until then, you will continue to have problems.

     

    Here are some thing to keep in mind.

     

    Before doing any work on the scripts, you must unload first. This is not optional.

    You must make sure no other script is using the same label. After unloading, run launchctl and check for your label to make sure it is gone.

     

    At some point, you may run into a problem when running daemon-style tools from launchd. Any such tool should also have the capability to run in non-daemon mode. But that is a moot point because you aren't yet ready to try anything like that.

  • by Sakshale2,

    Sakshale2 Sakshale2 Apr 18, 2016 10:54 AM in response to etresoft
    Level 1 (4 points)
    Mac OS X
    Apr 18, 2016 10:54 AM in response to etresoft

    > Before doing any work on the scripts, you must unload first. This is not optional.

     

    I would have assumed that was "before doing any work on the plists".   Editing a script that only called once a day at 3:30 in the morning, during the day time, should be a non-issue.  If launchd is constantly scanning at all the programs and scripts that are referenced in plists, 24/7, that is a real waste of resources.  plists need to be reloaded to update all internal flags and such, so I can see how working on them could easily cause problems.  If all programs and scripts referenced in plists are the issue, then normal system updates, which can happen at any time, would also cause problems.

     

    But, then, I am an cantankerous, old-school, Unix nerd who cut his teeth managing Solaris systems.  What would I know about these newfangled tools?

     

    FYI -- I was also testing LaunchControl, a donation-ware graphics tool for managing plists, which can be found at the soma-zone   It definitely looks like it will help, once I understand the rules enough to stop breaking things.

  • by etresoft,

    etresoft etresoft Apr 18, 2016 11:23 AM in response to Sakshale2
    Level 7 (29,056 points)
    Apr 18, 2016 11:23 AM in response to Sakshale2

    Sakshale2 wrote:

     

    I am an cantankerous, old-school, Unix nerd who cut his teeth managing Solaris systems.  What would I know about these newfangled tools?

    Hello again Sakshale2,

    Me too . But ask yourself this. How much success would you have had when you first learned Solaris if you had started with advanced administrative tasks?

     

    When I say "before doing any work on the scripts", I mean just that. This isn't Solaris and you don't know how it works. Before jumping in and running rsync as root at 3am, you really need to understand what is going to happen. I am not making any statement about how launchd works. That is irrelevant here. The issue is your understanding of launchd.

  • by Linc Davis,

    Linc Davis Linc Davis Apr 18, 2016 12:28 PM in response to Sakshale2
    Level 10 (207,926 points)
    Applications
    Apr 18, 2016 12:28 PM in response to Sakshale2

    "OnDemand" should not be used at all, according to the man page. After making any changes to the plist, you have to explicitly unload and reload it.

     

    Is anything logged when the job runs?

  • by Sakshale2,

    Sakshale2 Sakshale2 Apr 20, 2016 8:33 AM in response to Linc Davis
    Level 1 (4 points)
    Mac OS X
    Apr 20, 2016 8:33 AM in response to Linc Davis

    Well, color me completely confused.  I've been silent simply because I wanted to verify my observation.

     

    The last time I "loaded" the mlist, with no changes not previously tried and tested, it did not immediately spawn a job.  I waited to see what would happen, and surprisingly it ran at the correct time and completed normally.  I wasn't online yesterday, so I let it run again this morning.  Execution was perfect!

     

    The only change from the first one I posted, was the addition of logging instructions.

    <key>StandardErrorPath</key>

        <string>/tmp/com.rukbat.neptune.stderr</string>

        <key>StandardOutPath</key>

        <string>/tmp/com.rukbat.neptune.stdout</string>

     

     

    Thanks for your input.

    I am simply not going to touch this again, under the rule; If it works, don't fix it.

  • by cgerke,

    cgerke cgerke May 23, 2016 3:09 AM in response to Sakshale2
    Level 1 (12 points)
    Mac OS X
    May 23, 2016 3:09 AM in response to Sakshale2

    What are chances that your backup task only takes a few seconds to complete?