Share monitor stuck on waiting

I am running 10.7.2 on a mid 2010 12 cpu macpro. In order to use all of the cpus I have enabled Qmaster in Compressor 4 "share this computer as QuickCluster with services". I submitted a job to the cluster (which is, in this case, my own computer) but Share Monitor just says "waiting".


I have recreated the /Users/Shared/Library/Application Support/Apple Qmaster/Storage directories. I have checked the log files. All they show is that the cluster has started up. I have turned off my firewalls. Nothing seems to work ....

Posted on Jan 23, 2012 8:32 PM

Reply
6 replies

Jan 23, 2012 11:56 PM in response to Warwick Teale

have you looked at the logs in ~/Library/Application Support/Apple Qmaster/Logs ?


Yep. Log files below.


Items 1-2 log files:


contentagent.log


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349081519.574" tmt="01/23/2012 23:05:19.574" pnm="ContentAgent">

<mrk tms="349081519.575" tmt="01/23/2012 23:05:19.575" pid="414" kind="begin" what="log-session"/>


cluster.admin.log


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349081519.448" tmt="01/23/2012 23:05:19.448" pnm="CompressorJobController">

<mrk tms="349081519.449" tmt="01/23/2012 23:05:19.449" pid="415" kind="begin" what="log-session"/>


cluster.admin.log

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349081699.451" tmt="01/23/2012 23:08:19.451" pnm="qmasterqd">

<mrk tms="349081699.453" tmt="01/23/2012 23:08:19.453" pid="633" kind="begin" what="log-session"/>

<mrk tms="349081699.483" tmt="01/23/2012 23:08:19.483" pid="633" kind="begin" what="CJobControllerService::publishClusterStorage"></mrk>

<log tms="349081699.483" tmt="01/23/2012 23:08:19.483" pid="633" msg="Cluster storage URL = file2nfs://localhost/Users/dmoor/Library/Application%20Support/Apple%20Qmaster/ Storage/4B645808-0B537A59/shared/"/>

<log tms="349081699.483" tmt="01/23/2012 23:08:19.483" pid="633" msg="Publishing shared storage."/>

<log tms="349081704.546" tmt="01/23/2012 23:08:24.546" pid="633" msg="Subscribing to shared storage, local path = /Users/Shared/Library/Application Support/Apple Qmaster/Storage/4B645808-0B537A59/shared"/>

<log tms="349081704.569" tmt="01/23/2012 23:08:24.569" pid="633" msg="Result cluster storage URL = nfs://Mac-Pro-Shared.local/Users/me/Library/Application%20Support/Apple%20Qmast er/Storage/4B645808-0B537A59/shared"/>

<mrk tms="349081704.569" tmt="01/23/2012 23:08:24.569" pid="633" kind="end" what="CJobControllerService::publishClusterStorage"></mrk>


contentcontroller.log

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349081704.631" tmt="01/23/2012 23:08:24.631" pnm="ContentController">

<mrk tms="349081704.631" tmt="01/23/2012 23:08:24.631" pid="641" kind="begin" what="log-session"/>


qmaster.executor-(1-23).log


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349081699.451" tmt="01/23/2012 23:08:19.451" pnm="Qmaster%20Task%20Service">

<mrk tms="349081699.453" tmt="01/23/2012 23:08:19.453" pid="632" kind="begin" what="log-session"/>


stomp.log

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349081519.689" tmt="01/23/2012 23:05:19.689" pnm="CompressorTranscoder">

<mrk tms="349081519.690" tmt="01/23/2012 23:05:19.690" pid="416" kind="begin" what="log-session"/>


stompx.log

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349081699.533" tmt="01/23/2012 23:08:19.533" pnm="CompressorTranscoderX">

<mrk tms="349081699.534" tmt="01/23/2012 23:08:19.534" pid="634" kind="begin" what="log-session"/>


qmaster.executor logs (24 of em), which all have:


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<logs tms="349083267.040" tmt="01/23/2012 23:34:27.040" pnm="Qmaster%20Task%20Service">

<mrk tms="349083267.041" tmt="01/23/2012 23:34:27.041" pid="564" kind="begin" what="log-session"/>


Item 3: NFS

I turned off networking, and the submissions threw an error that they could not create a hard link. Since NFS was unavailable, this seemed logical. At one point I saw it creating the links, but I seem to be unable to reproduce it.


Item 4: Have 37 GB of free memory after submitted.


Item 5: With ethernet off nfs mounts failed (once).


Item 6: Copy as needed selected; don't see any copying occuring. Actually, don't even see the filenames in the executor logs. One time they did show up (when the nfs links were created).


Item 7: Two things show:


on Startup

1/23/12 11:34:32.277 PM mDNSResponder: Excessive update rate for Mac\032Pro\032-\032Shared\032Cluster-C741479F-E33E-41DE-B788-081C08C3E6DB._qmp5 ._tcp.local.; delaying announcement by 1 second

and randomly occuring:


1/23/12 11:53:12.000 PM kernel: ALF: ifnet_get_address_list_family error 12

Item 8: Removed Perian via Sysem Preferences

Jan 24, 2012 12:58 AM in response to PhotoShoot

hmm.. "nothing to see here".. (ITM..) thats a shame.... before we look at other things..


This message " 1/23/12 11:34:32.277 PM mDNSResponder: Excessive update rate for Mac\032Pro\032-\032Shared\032Cluster-C741479F-E33E-41DE-B788-081C08C3E6DB._qmp5 ._tcp.local.; delaying announcement by 1 second" is just Bonjour throwing a tatrum. Yeah when you crack up QMASTER especially with 16 instances it whinges for a while....


ok.. can't see anything in the qmaster/logs that's obvious. you would have seen it.......


tell you what.. try this..

  1. in compressor.app | application menu Apple Qmaster | Share This computer .. Setup tab ... untick Rendering and untick Compressor and say OK. Supply your root password. This will stop the subtasks in an ordered fashion.. wait for 10 secs else look at the actovity monitor.app to see if it all went away.. then...
  2. while you are still in compressor.app | application menu Apple Qmaster | Share This computer ... Setup tab ... untick Share This computer and say OK. Supply your root password. This will orderly shut down the Qmasterd set up but leave the main task up I believe.
  3. quit compressor
  4. do the same on any other hosts in the cluster. In fact shut any service nodes on other hosts if you can. Let's see if we can get a funtional run of a batch on on host...
  5. now get the BIG HAMMER out at either delete or move the WHOLE of the Apple Qmaster library at ~/Library/Application Support/Apple Qmaster folder . Use the finder.. move it to the trash or to the desktop... just get rid of it .. yes do it.. Qmaster will build a new one...
  6. restart compressor on your controller host... (your 2010 MACPRO 12 core...)
  7. set up the compressor options for qmaster again.. you know how... Compressor.app | application menu Apple Qmaster | Share This computer ... Tick these: Share this computer, Quickcluster and Compressor; set compressor options to 6 instances (test only) .. acknowledge OK and ya password.
  8. leave the Cluster storage defaulted where youhave it currently at file2nfs://localhost/Users/dmoor/Library/Application%20Support/Apple%20Qmaster/ Storage/4B645808-0B537A59/shared/"/>
  9. NOW!!!!....make sure you can MOUNT THE CLUSTER STORAGE. As we've mentioned in many posts recently, the cluster may not be available until you can mount the storage logically. Do this with Compressor.app | compressor application menu File | Mount Cluster Storage (ïŁż+shift+m). Now! if the cluster storage doent show in the list box then waut a while until it does. IF the cluster storage is there then redo steps 1 to Step 3 and resume here at Step 9.. else if you get impatient start form step again... (but no need)
  10. Lets test it out: in Compressor.app V4, just ADD A FILE (any clip over 15-30 mins to instigate Segmented transcoding) and use all the defaults but use a setting that provide segmented transcoding. Use the inspector | video and make sure "job segmenting " is selected. .. all we wanna do is to see if it fires up.
  11. in Compressor.app V4, submit the batch to your new cluster.
  12. Excluding the usual bad luck we all seem to have at times, the job should start up. Yes by all means use the Share Monitor, but be patient because somethime it doesn't communicate well with the QMaster app and you wont see any action... however...
  13. TIP: move your stylus/t-pad to the dock and do a GEAR DOWN over the compressor.app icon in the doc. You should see 6 instances of compressor active or available. You should also see a metion of one of the cluster storage systems you are using. If this is true then your cluster and qmaster should be in working order. This maybe a myth however its the best I've come up with other than trolling through the activity monitor.app and sampling the sub tasks.
  14. by this stage I REALLY HOPE your job has taken off.


Post your results. This procedure works for us most of the time... we do need to be patient even with our large MAc pros.


HTH


warwick

Hong Kong

Gong ci fat cai / Kong hei fa choi btw!

Jan 25, 2012 10:13 AM in response to PhotoShoot

Very helpful.


It has gotten me to the point where it at least intermittantly works. At this point it seems as if it relates to the processes. I was finding some abnormalities:


(1) Even when closed down the cluster and exited compressor some processes seem to hang around - CompressorTranscoder, CompressorJobController for example. If I force quit them they regenerate with qmasterd as the parent process - but that process (and pid) no longer exist!


(2) On reboot and starting compressor I had multiple CompressorTranscoders and CompressorJobControllers even though Qmaster was not enabled. Even exiting from Compressor and Share Monitor I still had one transcoder and one controller.


I rebooted and unchecked "reopen windows on reboot" option. This seemed to help.


It seems as if the "correct" process startup order is:


Start compressor:

Compressor, compressord, CompressorJob Controller, qmasterca and qmasterd


but I have also seen (without a reboot):

Compressor, CompressorJobController,Compressor Transcoder, qmasterd


Start cluster


ContentAgent (sometimes!), ContentController, qmasterqd start up. When stop ContentAgent may still run.


Puzzling that there is such a variation in process startups - although it may relate to the Qmaster directory in Application support.

Jan 25, 2012 7:59 PM in response to PhotoShoot

hmm.. this is a bit strange. Yes you are correct about the compressord, qmasterd hanging aorund. Possibly they are there when you start compressor.app up. Need more research me thinks!


In any case it looks like you have made some progress 😝. Are you able to do some transcoding now using compressor an a cluster?


BTW.. soem points that I am chsing at present:

  1. I'm using only a MANAGED CLUSTER across a few nodes for transcoding. I find QUICKCLUSTER seems unfortunately unstable. It wasr rock solid for me in V3.5. I spent many days looking into it and can't see a pattern. My colleague has the same issue.
  2. FCPX project "send to compressor" to cluster over multiple service nodes. JOb fails on other qmaster cluster nodes trying to access /var/folders/xx/ccc...archive on controller node. Looks like other service nodes want acccess to cluster controllers's /var/folder/. Even when this is mounted, still fails.. beats be.


The former is not a show stopper. I juts inactivate other instances in nodes and job works... any ideas?



I'll open a thread up on this this week unless I can resolve it.


gad to see you have mader soem progress.

Jan 26, 2012 10:52 AM in response to PhotoShoot

This morning everything worked on reboot, even though "reopen windows when logging back in" was checked on reboot dialogue box and Perian was later enabled. All evidence suggests that the problem is due to redundant processes, cause of which is as yet undetermined. The starting point would be to make sure none of the qmaster or compressor processes are running (rebooting or removing the Apple QMaster directory as necessary), starting everything up, and make sure only the appropiate processes are running before you submit a job.


Thanks - you did it! Sorry, I have only this node, so am unable to address the mananaged vs quickcluster issue.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Share monitor stuck on waiting

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.