Transitioning to Metal with 10.4.8 - Benchmarks & Observations
I want to preface this post by saying that these tests are representative of a very specific workflow as it applies to how I currently use Final Cut Pro X, and they should not be seen as a general overview. Despite this, I hope some of you find this information useful.
Host machine: 2018 Mac mini - 3.0GHz 6-core i5 - 32GB of RAM - Intel UHD Graphics 630
eGPU: Razer Core X with an MSI Radeon Vega 56 8GB
Display: BenQ BL2420PT at 2650x1440 using HDMI input
Working storage: SanDisk Extreme Portable 2TB SSD using USB-C (500/475 MB/s)
Source footage is 200 Mbps H.264 video from an insta360 EVO 360/180 camera with a resolution of 5760x2880 (5.7K). Timeline Playback tests were conducted with Optimized or Proxy Media. Shared and playback timelines were completely Unrendered (more on that below).
The Shared Project is a 4 minute, 1:1 ratio, 360-degree over/under stereoscopic video with a resolution of 5760x5760, equivalent to the pixel count of an 8K video. In order to conform two 180-degree stereoscopic videos to a 360-degree stereoscopic timeline, it contains four 5.7k video layers, each with a transform, custom mask, and colour correction. There is also a PNG graphic with an alpha layer over the full timeline. It is all placed in a compound clip and a reorient transform is applied. This timeline is what is Shared and used for Timeline Playback tests.
Tests were conducted with the pre-Metal Final Cut Pro X 10.4.6 and Metal-enhanced Final Cut Pro 10.4.8, with the display connected either directly to the Mac mini via HDMI (Built-in Display) or to the eGPU HDMI output (eGPU External Display).
All utilization percentages are observed averages and are for approximate reference only.
Personal Observations & Takeaway
With a few small exceptions and one big one (more on that below) the performance difference between 10.4.6 and 10.4.8 in this particular setup is minimal. For whatever reason though, Stabilization Analysis is consistently about 40% faster and Optimized Media Transcode is about 30% faster with 10.4.6.
The biggest deciding factor is Timeline Playback for Unrendered timelines if using the Metal enhanced 10.4.8. You must connect your display to the eGPU (and set to primary or mirrored) to see eGPU realtime playback acceleration. This will probably frustrate most iMac users and some MacBook users who are completely content with their built-in displays. I also prefer not to run the eGPU 24/7 and enjoy the convenience of powering on the eGPU to see on-demand benefit without having to disconnect and reconnect cables. This is not the case with the pre-Metal 10.4.6 where you see full benefit with your built-in display.
One could make the case to turn on Background Rendering, but that simply is not the same as having an immediate real-time preview of what you are working on. And in my particular case, 360/180 footage is often a series of long continuous clips, which means every change would trigger a render reset for the whole clip.
A bit of a discovery during this process was when I accidentally ran the project Share test without using Optimized Media, and instead the original H.264 assets were used. In every single case, the Optimized Media timeline took about 30% longer ! Not sure what’s at play here, but maybe someone else could shed some light on this. This is really not an issue for me since I use Proxy Media while working anyway.
Smaller Observations
Final Cut Pro X 10.4.6 performance is virtually identical whether using the built-in display or a display connected to the eGPU - in both task runtime and hardware utilization.
For Optimized and Proxy Media transcoding (H.264 source), the eGPU is employed almost equally but the CPU load is cut almost in half for Proxy transcoding. Interestingly, with 10.4.6 the task runtimes are identical for both.
Oddly, timeline playback of half-resolution Proxy Media (2880x1440) is twice as taxing on the eGPU when compared to full-resolution Optimized Media running with Best Performance. Perhaps Best Performance steps down to ¼ resolution?
Stabilization Analysis only used the internal GPU with 10.4.8 and did not push CPU usage much beyond 50%. This resulted in 40% slower performance when compared to 10.4.6 which used some eGPU but over 90% CPU utilization.
Mac mini 2018 or later