Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Qmaster Cluster Tests

From all reports (and confirmed by my experience) Qmaster Clusters in Compressor 4 leave a lot to be desired.


After several hours of puttering around, I finally got clusters to work. Here is the set up.


Two iMac 27 systems, each with 8 cores.


Cluster was set to use all 16 cores.


Video to be run through Compressor 4 was 5 minutes of 720p.


Test using cluster with all 16 cores took 11 minutes to render. (And I was able to monitor all 16 cores in action with Qmaster Admin panel.)


Test using This Computer Only took 5 minutes to render.


Not quite what I expected.


Perhaps the network overhead of distributing a 5 minute video made up the difference.


I am rerunning test today with a 30 minute video to see timing.


Does anyone see anything I may be missing here?


RR

iMac (27-inch Mid 2010), Mac OS X (10.6.8), + 21 inch Cinema Display

Posted on Jan 20, 2013 8:09 AM

Reply
18 replies

Jan 20, 2013 9:24 AM in response to Rex Ross

Rex,


The WIFi connection will definitely be a factor in the longer-than-expected times.


Here is an alternative test. Create a Quick Cluster (one iMac) with four instances. Your machine has hyper-threading so four should be about ideal. 16 instances was probably excessive. (Apple's advice is half the cores for maximum efficiency.) For one, the OS and app need some of the resources. For another, after the file gets chopped up into all those segments and processing is completed, some time is needed in re-assembling them. Re-assembling 16 segments will presumably be more involved than say, 4.


Post back with results and/or further insights.

Thanks.


Russ

Jan 20, 2013 9:55 AM in response to Russ H

Will give it a try.


A quick suggestion or two regarding quickclusters would be helpful. I have not used them and trial and error with Qmaster is wearing me down.


Here is a quick test set up from a spare macbook pro I have


Presumably I select Share This Computer and set parameters as follow:User uploaded file


User uploaded file


Anyway, I did that but it did not ever seem to give me an opportunity to drag cores into the main qmaster window.


I must be missing something.


Suggestions?

Jan 20, 2013 10:04 AM in response to Rex Ross

In the Setup window next to Compressor, click Options to select number of instances. You shouldn't have to do anything else. When you sit submit, you should get a choice of This Computer or your Quick Cluster.


BTW, discussion boards of all types are filled with threads of frustration with trying to set up distributed processing; you are not alone.


Good luck with the QC.


Russ

Jan 20, 2013 11:26 PM in response to Rex Ross

HI Rex, a couple of extra things you might like to consider in your set up of your cluster and some notes into addition of those in this thread. I want to put these/ experiences ideas down because it seems people are really having issues using this wonderful component compressor/qmaster.


Your Qmaster Configuration (across your two imacs):

(1) make and dedicate a separate subnet for your qmaster nodes between the two imacs.

  • You should try and use this for all your QMASTER interaction between the hosts without incoporating other "internet" activity.
  • Connect a cat6 ethernet cable directly form one imac (hosta) to the ethernet port of ther other mac (imac b).
  • On each network preferences for the EN0 (ethernet port on MAC) set a manual dedicated IP address such as 1.1.1.1 and on the other 1.1.1.2
    In the Network advanced tab for the ETHERNET NIC, under HARDWARE, make sure that the speed is 1000base-t. yeah sure, set a network mask to the usual f.f.f.0
  • If you can, try and set the packet size jumbo frames to the a max of 9000. (maybe you cant... it seems hit and miss 😕).Less network IO's.
  • Ping each other over those iP addresses to make sure your path is there.
  • IN the network prefs, ORDER the network NICS to haev your day to day intenet WIFI at the top.
  • To access your normal "internet", use the WIFI on each.
  • should all be ok


(2) set QMASTER to use this dedicated network interface

at compressor.app/Apple Qmaster (app menu)/Share This computer/Advanced tab/ Network Settings: (1) untick Discover over Bonjour" and (2) select the ETHERNET interface from the listbox. Optional use a different band of ports.. say (61000, 300)


(3) Also optionally use another directory or file system folder for the QMaster Storage.

  • (I use on on a disk array (seperate file system other than /var) that is direct connected to a mac pro, the other nodes have it NFS mounted (shared over a similar dedicated interface as above).
  • Customise your transcoding cluster as you see fit (Managed Cluster (apple qmaster admin) or Quickcluster or "computer plus" [only service nodes))
  • Lately I have found that a MANAGED CLUSTER is very robust. Use the apple qmaster administrator.app for this.
  • I found that quickcluster is not what it was... thats me... I now use a MANAGED cluster for the mac minis and the macpro.. works great!


Theres a few procedures documented on this Compressor 4 forum showing how to see if the cluster is UP and ok. Sounds like you know what you're doing so no need for me to go into this.


(4) Crank up the Cluster (tick the "Share This computer)... etc)

  • make sure the cluster is up and that it sees the service nodes on the current host. Use the actoivity monitor. Also GEAR DOWN on the COMPRESSOR iCON in the finder dock of each HOST and see that the services are started.. all of them.... you will see that each service has its dedicated IP addressa & port (1.1.1.1:61610),
  • you can do the same on the other imac host...
  • Also as you have done, use Apple Qmaster Administrator to see the service nodes on each host. You should see that each service has a discrete IP address and port.. (e.g. 1.1.1.2:61403) for the number of services you have used


(5) Mount the cluster storage for that cluster on each host. It should mount over the dedicated subnet as above.

Use the compressor.app/File/ Mount Cluster Storage. You will see it and other if you have them defined, in a list box. If that doesnt show up, it means the custer is NOT initialised. So do it all again... eventually it will be very stable.


(6) Mount all the file systems that have the source directories and the DISTRIBUTION directories ON ALL systems in your transcoding


  • What we're looking for is that your may workflow avoids preprocessing (preflight) COPYING source element to the CLUSTER if at all possible.
  • This elongated workflow is exaccerbated by a slow network as that described by Russ in a previous post.. (wifi). This is why a dedicated NETWORK for the NFS and the communications for QMASTER seems to be far more robust and faster (as usual... always subject tp taste!)


(7) Use compressor.app/ preferences "Cluster Options" ..

  • try NEVER COPY TO CLUSTER" first.
  • See if you can obtain a result. If the job sits in waiting, change it back to COPY AS NEEDED. Yeah this is sad, but I dont know why this happens soemtimes.
  • You may ALSO try "NEVER COPY TO/FROM Cluster" when you get it all working. Obviously this removes the need to use the cluster for making instances of the source elements for th the transcode and also from a COPY operation from the cluster to the final distribution detsination.
  • Again take some trial and error, and us ethe setting that suits you. I've had mixed results that have been typically based in some new software update activity "and the where the moon is position at the time of day over hong kong" 😉


(8) Time to Assemble the parts

It is also true that a large proportion of a distributed segmented multipass transcodes across transcode cluster is the resources and time consumed to ASSEMBLING the PARTS for the final distribution. You can avoid this in part by not being too aggressive wi the number of instances (services) you run on your host unless you have a dedicated STORAGE AREA NETWORK or extremely fast set of network file systems.


  • I've found that giving 80% or even more of the available cores to service transcoding clusters neither bogs down a TRANSCODE as long as there is sufficient memory.
  • Again this is subjective and various R.O.T.S prevail.. so go less at first. ON this MACPRO Nelhalem, I use 12 vcores, and 6 vcores on earch of the MAC MINi i7s. The only time I reduce the number of services this is when I drop a MOTION project in there. This (motion.app render) eats RAM.. so even with the memory I have, paging does occur.. (topic for another day).


(9) Overall Service Time is only as good as the procesisng of the slowest node.

  • Yep common sense. Several wasy to do this...You might have to set up a set of unmanaged services and other clusters dpeending on the work you wanna do. (ie transcoding , rendering and MOTION rendering for example).
  • Many people have hooked up the wife and kids Imacs and macbook pros to a network hub and run qmaster over all of them with very very slow results of a transcode.


(10) Recent Workflows using Compressor in a distributed segmanted multinode transcode cluster (anecdote/experience)

  • Ive found that I'm able to drop many hundreds of 5D, Iphone etc etc footage into a MAnaged Cluster for transcoding in to PRORES so that I can avoid the overhead of FCPX performing this transcode during import. In this circumstance, COmpressor.app V4 does a super job transcoding these into PRORES.
  • My workflow is then to simply import the PRORES into an EVENT in FCPX and off we go.


(11) Things to watch during distributed multipass segmented transcoding:

  • watch the logs. use SHARE MONITOR.app to look initially at each segments.
  • you will notice that qmaster does a great job of often restarting failed segments and they are eventually transcoded to completion.
  • try to avoid having COPYING the source (see above) this takes time and addes to the service time of the job.
  • monitor any actviity and the times when you see "preparing to copy source". You can easily see these in the logs. Use the console.app and there is a cool "FIND" in there now.
  • See that your I/O over your dedicated network is as fast and efficient as you can make it. Its not tha act of WRITING the distibution or the parts , its the READing that is slow.
  • watch the transcode elapsed time for the SEGMENTS on the slowest host. This will be a make or break. As I stated earlier, this often really slows up the job. Using a Managed cluster, you can DYNAMICALLY add and takeaway not only a full host node but also a single instance using Apple Qmaster Administrator.


Post your results for others to see.


HTH


Warwick









Jan 21, 2013 6:11 AM in response to Warwick Teale

Warren & Russ H


I really appreciate the advice and suggestions.


Warren, I think your outlined methodology looks very good. However, I have sort of run out of gas on using clusters and since most of my videos are in the range of 5-15 minutes, the payback to using clusters is sort of minimal to me.


Should I move into longer videos (30 min +) the dedicated subnet approach should have great returns for the effort in setting up that structure.


One last test note.


I used a test 30 minute 720p video in FCPx.


Compressor settings used were the pre-set Video Sharing 720p


Compressor time was 50 minutes to render with NO cluster. I.e. using just "This Computer"


Compresser time was 33 minutes to render with a quickcluster on one iMac using 4 of the 8 cores.


That was simple to set up and offered some time savings.


I did try to set up a cluster using 4 cores on each of the two iMacs, but had continuing problems with that and decided to forget it for now.


So once again, thanks to all. I think a lot of people will be interested in the good information you guys posted on thread.

Jan 21, 2013 6:30 AM in response to Rex Ross

Hey Rex.



Rex Ross wrote:




Compressor time was 50 minutes to render with NO cluster. I.e. using just "This Computer"


Compresser time was 33 minutes to render with a quickcluster on one iMac using 4 of the 8 cores.


That was simple to set up and offered some time savings.


And that's within the range of savings I might have expected.



So once again, thanks to all. I think a lot of people will be interested in the good information you guys posted on thread.

Thanks for posting the results, which I think many people will also appreciate.


Russ

Jan 21, 2013 6:50 AM in response to Russ H

By the way, in cleaning up my files to get rid of my cluster test stuff, I noticed that


the final file generated by the 4 core cluster test was 2.25 GB.


while


the final file generated by the no-cluster test was 2.21 GB.


Both had the same time length and pixel dimensions.


The difference was that


the 4 core cluster test produced a file with bitrate of 9.92 Mbit/s


while


the no cluster test produced a file with bitrate of 9.73 Mbit/sec.


As the only difference in the workflow was the cluster vs no cluster, it is not really clear to me why the final files are not virtually identical.


Not a really huge difference, but one more mystery to add to the Cluster Saga.


RR

Jan 21, 2013 7:52 AM in response to Rex Ross

To add to...


What computer are you using? How much Ram (more ram the better)? What is the video card (more vram the better)? What codec are you using?


Your having to join 2.24 gigs back together (4 instances). This can take up to 25-50% added time.


If you using ProRes, turn off Fast Start. If your planning to upload the video to the web or to a service like YouTube, turn off frame reordering when compressing to h.264. If the h.264 is not going to the web, turn off Fast Start. This will save 25-50% in encoding time.


If your using clusters Fast Start doesn't matter if it on or off.


Open Activity Monitor and use it to check how much ram and computer power your using when encoding.


User uploaded file

CPU% of 125-175% is a good range for each instances. 600 MB- 1 gig per instances is good too. For checking the vram, the only app I know that checks this is iStat Menu.

Jan 21, 2013 10:34 AM in response to David M Brewer

David,


Thanks for the note.


Am using iMac 27 - new late 2012 thin bezel model.


3.4 GHz Core I7

32 GB ram

NVIDIA GeForce GTX 675 MX 1024MB


Anyway, my final conclusion is that since virtually all of my videos are less than 15 minutes running time, the hassle of dealing with clusters is just not worth the effort.


Whether it takes 10 minutes or 20 minutes for Compressor to handle my output is of no concern to me.


So will likely not revisit clusters unless and until I move into videos which are more than about 30-45 minutes long.


RR

Jan 28, 2014 4:27 AM in response to Rex Ross

Dear Rex Roxx,


This is the first discussion wich is close to my problem:

Have you ever tried to use the second iMac as an external Display at the prime-iMac while using both in a cluster? I just bought a "new" imac to replace my older modell, but now the price i can get is too low so that i am not willing to sell. So i think about selling my 23" ACD and using the Second Mac as Monitor instead AND using the processor for Clustering in OSX Mavericks. Have you ever Tried to use it like this? If not have you find a good source where i can start research?

Qmaster Cluster Tests

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.