Skip navigation

Qmaster Cluster Tests

871 Views 18 Replies Latest reply: Jan 29, 2014 1:46 AM by Warwick Teale RSS
1 2 Previous Next
Rex Ross Calculating status...
Currently Being Moderated
Jan 20, 2013 8:09 AM

From all reports (and confirmed by my experience) Qmaster Clusters in Compressor 4 leave a lot to be desired.

 

After several hours of puttering around, I finally got clusters to work. Here is the set up.

 

Two iMac 27 systems, each with 8 cores.

 

Cluster was set to use all 16 cores.

 

Video to be run through Compressor 4 was 5 minutes of 720p.

 

Test using cluster with all 16 cores took 11 minutes to render. (And I was able to monitor all 16 cores in action with Qmaster Admin panel.)

 

Test using This Computer Only took 5 minutes to render.

 

Not quite what I expected.

 

Perhaps the network overhead of distributing a 5 minute video made up the difference.

 

I am rerunning test today with a 30 minute video to see timing.

 

Does anyone see anything I may be missing here?

 

RR

iMac (27-inch Mid 2010), Mac OS X (10.6.8), + 21 inch Cinema Display
  • Russ H Level 6 Level 6 (12,845 points)
    Currently Being Moderated
    Jan 20, 2013 9:24 AM (in response to Rex Ross)

    Rex,

     

    The WIFi connection will definitely be a factor in the longer-than-expected times.

     

    Here is an alternative test. Create a Quick Cluster (one iMac) with four instances. Your machine has hyper-threading so four should be about ideal. 16 instances was probably excessive. (Apple's advice is half the cores for maximum efficiency.) For one, the OS and app need some of the resources. For another, after the file gets chopped up into all those segments and processing is completed, some time is needed in re-assembling them. Re-assembling 16 segments will presumably be more involved than say, 4.

     

    Post back with results and/or further insights.


    Thanks.

     

    Russ

  • Russ H Level 6 Level 6 (12,845 points)
    Currently Being Moderated
    Jan 20, 2013 10:04 AM (in response to Rex Ross)

    In the Setup window next to Compressor, click Options to select number of instances. You shouldn't have to  do anything else. When you sit submit, you should get a choice of This Computer or your Quick Cluster.

     

    BTW, discussion boards of all types are filled with threads of frustration with trying to set up distributed processing; you are not alone.

     

    Good luck with the QC.

     

    Russ

  • Warwick Teale Level 3 Level 3 (555 points)
    Currently Being Moderated
    Jan 20, 2013 11:26 PM (in response to Rex Ross)

    HI Rex, a couple of extra things you might like to consider in your set up of your cluster and some notes into addition of those in this thread. I want to put these/ experiences ideas down because it seems people are really having issues using this wonderful component compressor/qmaster.

     

    Your Qmaster Configuration (across your two imacs):

    (1) make and dedicate a separate subnet for your qmaster nodes between the two imacs.

    • You should try and use this for all your QMASTER interaction between the hosts without incoporating other "internet" activity.
    • Connect a cat6 ethernet cable directly form one imac (hosta) to the ethernet port of ther other mac (imac b).
    • On each network preferences for the EN0 (ethernet port on MAC) set a manual dedicated IP address such as 1.1.1.1 and on the other 1.1.1.2
      In the Network advanced tab for the ETHERNET NIC,  under HARDWARE, make sure that the speed is 1000base-t. yeah sure, set a network mask to the usual f.f.f.0
    • If you can, try and set the packet size jumbo frames to the a max of 9000. (maybe you cant... it seems hit and miss ).Less network IO's.
    • Ping each other over those iP addresses to make sure your path is there.
    • IN the network prefs, ORDER the network NICS to haev your day to day intenet WIFI at the top.
    • To access your normal "internet", use the WIFI on each.
    • should all be ok

     

    (2) set QMASTER to use this dedicated network interface

    at compressor.app/Apple Qmaster (app menu)/Share This computer/Advanced tab/ Network Settings: (1) untick Discover over Bonjour" and (2) select the ETHERNET interface from the listbox. Optional use a different band of ports.. say (61000, 300)

     

    (3) Also optionally use another directory or file system folder for the QMaster Storage.

    • (I use on on a disk array (seperate file system other than /var) that is direct connected to a mac pro, the other nodes have it NFS mounted (shared over a similar dedicated interface as above).
    • Customise your  transcoding cluster as you see fit (Managed Cluster (apple qmaster admin) or Quickcluster or "computer plus" [only service nodes))
    • Lately I have found that a MANAGED CLUSTER is very robust. Use the apple qmaster administrator.app for this.
    • I found that quickcluster is not what it was... thats me... I now use a MANAGED cluster for the mac minis and the macpro.. works great!

     

    Theres a few procedures documented on this Compressor 4 forum showing how to see if the cluster is UP and ok. Sounds like you know what you're doing so no need for me to go into this.

     

    (4) Crank up the Cluster (tick the "Share This computer)... etc)

    • make sure the cluster is up and that it sees the service nodes on the current host. Use the actoivity monitor. Also GEAR DOWN on the COMPRESSOR iCON in the finder dock of each HOST and see that the services are started.. all of them.... you will see that each service has its dedicated IP addressa & port (1.1.1.1:61610),
    • you can do the same on the other imac host...
    • Also as you have done, use Apple Qmaster Administrator to see the service nodes on each host. You should see that each service has a discrete IP address and port.. (e.g. 1.1.1.2:61403) for the number of services you have used

     

    (5) Mount the cluster storage for that cluster on each host. It should mount over the dedicated subnet as above.

    Use the compressor.app/File/ Mount Cluster Storage. You will see it and other if you have them defined, in a list box. If that doesnt show up, it means the custer is NOT initialised. So do it all again... eventually it will be very stable.

     

    (6) Mount all the file systems that have the source directories and the DISTRIBUTION directories ON ALL systems in your transcoding

     

    • What we're looking for is that your may workflow avoids preprocessing (preflight) COPYING source element to the CLUSTER if at all possible.
    • This elongated workflow is exaccerbated by a slow network as that described by Russ in a previous post.. (wifi). This is why a dedicated NETWORK for the NFS and the communications for QMASTER seems to be far more robust and faster (as usual... always subject tp taste!)

     

    (7) Use compressor.app/ preferences "Cluster Options" ..

    • try NEVER COPY TO CLUSTER" first.
    • See if you can obtain a result. If the job sits in waiting, change it back to COPY AS NEEDED. Yeah this is sad, but I dont know why this happens soemtimes.
    • You may ALSO try "NEVER COPY TO/FROM Cluster" when you get it all working. Obviously this removes the need to use the cluster for making instances of the source elements for th the transcode and also from a COPY operation from the cluster to the final distribution detsination.
    • Again take some trial and error, and us ethe setting that suits you. I've had mixed results that have been typically based in some new software update activity "and the where the moon  is position at the time of day over hong kong"

     

    (8) Time to Assemble the parts

    It is also true that a large proportion of a distributed segmented multipass transcodes across transcode cluster is the resources and time consumed to ASSEMBLING the PARTS for the final distribution. You can avoid this in part by not being too aggressive wi the number of instances (services) you run on your host unless you have a dedicated STORAGE AREA NETWORK or extremely fast set of network file systems.

     

    • I've found that giving 80% or even more of the available cores to service transcoding clusters neither bogs down a TRANSCODE as long as there is sufficient memory.
    • Again this is subjective and various R.O.T.S prevail.. so go less at first. ON this MACPRO Nelhalem, I use 12 vcores, and 6 vcores on earch of the MAC MINi i7s. The only time I reduce the number of services this is when I drop a MOTION project in there. This (motion.app render) eats RAM.. so even with the memory I have, paging does occur.. (topic for another day).

     

    (9) Overall Service Time is only as good as the procesisng of the slowest node.

    • Yep common sense. Several wasy to do this...You might have to set up a set of unmanaged services and other clusters dpeending on the work you wanna do. (ie transcoding , rendering and MOTION rendering for example).
    • Many people have hooked up the wife and kids Imacs and macbook pros to a network hub and run qmaster over all of them with very very slow results of a transcode.

     

    (10) Recent Workflows using Compressor in a distributed segmanted multinode transcode cluster (anecdote/experience)

    • Ive found that I'm able to drop many hundreds of 5D, Iphone etc etc footage into a MAnaged Cluster for transcoding in to PRORES so that I can avoid the overhead of FCPX performing this transcode during import. In this circumstance, COmpressor.app V4 does a super job transcoding these into PRORES.
    • My workflow is then to simply import the PRORES into an EVENT  in FCPX and off we go.

     

    (11) Things to watch during distributed multipass segmented transcoding:

    • watch the logs. use SHARE MONITOR.app to look initially at each segments.
    • you will notice that qmaster does a great job of often restarting failed segments and they are eventually transcoded to completion.
    • try to avoid having COPYING the source (see above) this takes time and addes to the service time of the job.
    • monitor any actviity and the times when you see "preparing to copy source". You can easily see these in the logs. Use the console.app and there is a cool "FIND" in there now.
    • See that your I/O over your dedicated network is as fast and efficient as you can make it. Its not tha act of WRITING the distibution or the parts , its the READing that is slow.
    • watch the transcode elapsed time for the SEGMENTS on the slowest host. This will be a make or break. As I stated earlier, this often really slows up the job. Using a Managed cluster, you can DYNAMICALLY add and takeaway not only a full host node but also a single instance using Apple Qmaster Administrator.

     

    Post your results for others to see.

     

    HTH

     

    Warwick

     

     

     

     

     

     

     

     


    Final Cut Pro X, OS X Mountain Lion (10.8.2), Motion5 , Compressor 4
  • Russ H Level 6 Level 6 (12,845 points)
    Currently Being Moderated
    Jan 21, 2013 6:30 AM (in response to Rex Ross)

    Hey Rex.

     

     

    Rex Ross wrote:

     

     

     

    Compressor time was 50 minutes to render with NO cluster. I.e. using just "This Computer"

     

    Compresser time was 33 minutes to render with a quickcluster on one iMac using 4 of the 8 cores.

     

    That was simple to set up and offered some time savings.

     

    And that's within the range of savings I might have expected.

     

     

    So once again, thanks to all. I think a  lot of people will be interested in the good information you guys posted on  thread.

    Thanks for posting the results, which I think many people will also appreciate.

     

    Russ

  • David M Brewer Level 6 Level 6 (9,180 points)
    Currently Being Moderated
    Jan 21, 2013 7:52 AM (in response to Rex Ross)

    To add to...

     

    What computer are you using? How much Ram (more ram the better)? What is the video card (more vram the better)? What codec are you using?

     

    Your having to join 2.24 gigs back together  (4 instances). This can take up to 25-50% added time.

     

    If you using ProRes, turn off Fast Start. If your planning to upload the video to the web or to a service like YouTube, turn off frame reordering when compressing to h.264. If the h.264 is not going to the web, turn off Fast Start. This will save 25-50% in encoding time.

     

    If your using clusters Fast Start doesn't matter if it on or off.

     

    Open Activity Monitor and use it to check how much ram and computer power your using when encoding.

     

    Screen Shot 2013-01-21 at 8.43.44 AM.png

    CPU% of 125-175% is a good range for each instances. 600 MB- 1 gig per instances is good too. For checking the vram, the only app I know that checks this is iStat Menu.

1 2 Previous Next

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.