Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Transitioning to Metal with 10.4.8 - Benchmarks & Observations

I want to preface this post by saying that these tests are representative of a very specific workflow as it applies to how I currently use Final Cut Pro X, and they should not be seen as a general overview. Despite this, I hope some of you find this information useful. 


Host machine: 2018 Mac mini - 3.0GHz 6-core i5 - 32GB of RAM - Intel UHD Graphics 630

eGPU: Razer Core X with an MSI Radeon Vega 56 8GB

Display: BenQ BL2420PT at 2650x1440 using HDMI input

Working storage: SanDisk Extreme Portable 2TB SSD using USB-C (500/475 MB/s)


Source footage is 200 Mbps H.264 video from an insta360 EVO 360/180 camera with a resolution of 5760x2880 (5.7K). Timeline Playback tests were conducted with Optimized or Proxy Media. Shared and playback timelines were completely Unrendered (more on that below)


The Shared Project is a 4 minute, 1:1 ratio, 360-degree over/under stereoscopic video with a resolution of 5760x5760, equivalent to the pixel count of an 8K video. In order to conform two 180-degree stereoscopic videos to a 360-degree stereoscopic timeline, it contains four 5.7k video layers, each with a transform, custom mask, and colour correction. There is also a PNG graphic with an alpha layer over the full timeline. It is all placed in a compound clip and a reorient transform is applied. This timeline is what is Shared and used for Timeline Playback tests.


Tests were conducted with the pre-Metal Final Cut Pro X 10.4.6 and Metal-enhanced Final Cut Pro 10.4.8, with the display connected either directly to the Mac mini via HDMI (Built-in Display) or to the eGPU HDMI output (eGPU External Display). 


All utilization percentages are observed averages and are for approximate reference only.



Personal Observations & Takeaway


With a few small exceptions and one big one (more on that below) the performance difference between 10.4.6 and 10.4.8 in this particular setup is minimal. For whatever reason though, Stabilization Analysis is consistently about 40% faster and Optimized Media Transcode is about 30% faster with 10.4.6.


The biggest deciding factor is Timeline Playback for Unrendered timelines if using the Metal enhanced 10.4.8. You must connect your display to the eGPU (and set to primary or mirrored) to see eGPU realtime playback acceleration. This will probably frustrate most iMac users and some MacBook users who are completely content with their built-in displays. I also prefer not to run the eGPU 24/7 and enjoy the convenience of powering on the eGPU to see on-demand benefit without having to disconnect and reconnect cables. This is not the case with the pre-Metal 10.4.6 where you see full benefit with your built-in display.


One could make the case to turn on Background Rendering, but that simply is not the same as having an immediate real-time preview of what you are working on. And in my particular case, 360/180 footage is often a series of long continuous clips, which means every change would trigger a render reset for the whole clip.


A bit of a discovery during this process was when I accidentally ran the project Share test without using Optimized Media, and instead the original H.264 assets were used. In every single case, the Optimized Media timeline took about 30% longer ! Not sure what’s at play here, but maybe someone else could shed some light on this. This is really not an issue for me since I use Proxy Media while working anyway.


Smaller Observations


Final Cut Pro X 10.4.6 performance is virtually identical whether using the built-in display or a display connected to the eGPU - in both task runtime and hardware utilization.


For Optimized and Proxy Media transcoding (H.264 source), the eGPU is employed almost equally but the CPU load is cut almost in half for Proxy transcoding. Interestingly, with 10.4.6 the task runtimes are identical for both.


Oddly, timeline playback of half-resolution Proxy Media (2880x1440) is twice as taxing on the eGPU when compared to full-resolution Optimized Media running with Best Performance. Perhaps Best Performance steps down to ¼ resolution?


Stabilization Analysis only used the internal GPU with 10.4.8 and did not push CPU usage much beyond 50%. This resulted in 40% slower performance when compared to 10.4.6 which used some eGPU but over 90% CPU utilization.


Mac mini 2018 or later

Posted on Dec 31, 2019 9:56 AM

Reply

Similar questions

4 replies

Jan 5, 2020 12:39 PM in response to MZTVPD

Very interesting analysis and a very helpful spreadsheet.

My Mac Pro 5,1 is still on Sierra and given it has the Mac Edition Graphics Card I was ready to install Catalina. However, this is no not possible according to https://support.apple.com/en-us/HT207877


The comparisons between 10.4.6 and 10.4.8 is interesting so I'll bide my time using Sierra and the older FCP X version. Will investigate work-arounds using a MacBook Pro with Catalina. Copying files over? Networking? Installing a hack? Don't have time to deal with it now so will continue doing edits on the MP and, just maybe, the MBP on occasion.

Dec 31, 2019 10:33 AM in response to MZTVPD

In 10.4.6 Metal acceleration was not in place then in 10.4.7 there are changes as per release notes

Final Cut Pro 10.4.7

Released October 7, 2019

  • New Metal-based processing engine improves playback and accelerates graphics tasks, including rendering, compositing, real-time effects, exporting, and more.

What this note means is that now Final Cut will use Intel Quick Sync that is a feature of the CPU for hardware decoding and encoding this typically invokes the on-board GPU so maximise the bandwidth. There are instances where playback in final cut pro grinds to a halt with original media this is predominantly in 4K situation and due to the 1.5 Gb limitation of shared memory given to the eGPU.

In terms of analysis that is made of decode and CPU computation.


So what is changing here is mostly due to the change of playback that in your specific use is somewhat penalising but for the majority of FCP users without an eGPU an improvement.

Bear in mind that analysis and transcode are not the focus of eGPU in the FCP design


What I really do not understand if your ProRes playback is it the case that you switched off background rendering and the optimised or proxy are not rendered?

Also I understand the media is on external SSD but where is the FCP library and cache and optimised media?

In general ProRes is a matter of disk speed not processing any unit can playback ProRes

Jan 8, 2020 4:13 AM in response to MZTVPD

A bit of a discovery during this process was when I accidentally ran the project Share test without using Optimized Media, and instead the original H.264 assets were used. In every single case, the Optimized Media timeline took about 30% longer ! Not sure what’s at play here, but maybe someone else could shed some light on this. This is really not an issue for me since I use Proxy Media while working anyway.


Smaller Observations


Final Cut Pro X 10.4.6 performance is virtually identical whether using the built-in display or a display connected to the eGPU - in both task runtime and hardware utilization.


For Optimized and Proxy Media transcoding (H.264 source), the eGPU is employed almost equally but the CPU load is cut almost in half for Proxy transcoding. --> This makes sense as proxy is half resolution



Oddly, timeline playback of half-resolution Proxy Media (2880x1440) is twice as taxing on the eGPU when compared to full-resolution Optimized Media running with Best Performance. Perhaps Best Performance steps down to ¼ resolution?


Proxy media is half resolution best performance is whatever the size of the preview window is so may be smaller


Stabilization Analysis only used the internal GPU with 10.4.8 and did not push CPU usage much beyond 50%. This resulted in 40% slower performance when compared to 10.4.6 which used some eGPU but over 90% CPU utilization.


If you have optimised media the analysis will be performed on that if not using original media. If you have performed the analysis on h264 what your tests are showing is that FCP is decoding with hardware acceleration in 10.4.8 but not in 10.4.6 in practical terms nobody is going to hit analyse for stabilisation and then wait so even if it takes longer you have CPU free to do other tasks and not bring the system to a half


In general the type of test you are conducting are not the way majority of final cut users work and seem more a sequence of batch steps where you join clip apply a bunch of effects and then look how it came out without rendering.

If this is the way you intend to work it makes no sense to optimise the clips what you should try to do is

  1. Import the clip without optimisation and leave on external drive (internal drive SSD for rendering is a must)
  2. Run stabilisation or other analysis on the native clips
  3. Render your project
  4. Playback render files

If what you have is something you don't like change effect and start over.


This will save you disk space and time. Optimise files only makes sense if you are actually looking at real time effect as they develop or testing joins and cuts. Once your timeline is rendered export will be fast as it will use render files

Jan 7, 2020 10:11 PM in response to MZTVPD

Very good style sheet ! This is very strange, I did some testing with 4K material, created a 16 min project with transitions and video effects. I have a very similar setup as you have only my eGPU is a RX5700 in a Sonnet box. I did not do as much as you, still I have very different observations than yours. In my case, not doing any optimizations, no proxy, the background rendering is using the eGPU in both cases (eGPU not connected to a display), but 10.4.8 is about 20-25% faster. Are you also on Catalina?

Transitioning to Metal with 10.4.8 - Benchmarks & Observations

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.