Issue: YOLO Model Training Extremely Slow on M3 Max

Question

Level 1

4 points

Issue: YOLO Model Training Extremely Slow on M3 Max

System

MacBook Pro with M3 Max (40-core GPU, 40GB GPU RAM)
128GB unified memory
macOS Sequoia 15.3.2

Problem

Despite high-end hardware, YOLO model training is unexpectedly slow with poor GPU utilization. The same models train significantly faster on NVIDIA hardware.

Attempted Solutions

Latest Apple Silicon-optimized ML frameworks
Various batch sizes (4-32)
Both MPS and CPU backends
Native and Rosetta environments

Request

Please advise on optimizing YOLO training for Apple Silicon. Are there specific configurations or known limitations with MPS implementation for object detection models?

This high-end machine was purchased specifically for ML development, but current performance makes it impractical for YOLO training and other Deep Learning frameworks specified for image processing.

Thanks

Posted on May 9, 2025 7:56 PM

Reply

Answer 1

May 10, 2025 12:38 PM in response to DrSaadLa

<< After all that, am I missing something out? I asked myself. >>

Yes, you are missing something.

Models ported from other system may need optimization for the exact hardware you are running on. That is often not so simple as throwing a switch.

If you just recompiled for a different processor, it would run, but you would call it very slow. Because it is not able to immediately take advantage of the array transform processors offered by GPU and neural engine, unless either the code (or the runtime environment) is modified to support using those co-processors.

Reply

Answer 2

May 10, 2025 3:58 PM in response to DrSaadLa

"Apple Silicon MPS Training

With the support for Apple silicon chips integrated in the Ultralytics YOLO models, it's now possible to train your models on devices utilizing the powerful Metal Performance Shaders (MPS) framework. The MPS offers a high-performance way of executing computation and image processing tasks on Apple's custom silicon.

To enable training on Apple silicon chips, you should specify 'mps' as your device when initiating the training process. Below is an example

https://docs.ultralytics.com/modes/train/#idle-gpu-training

Reply

Answer 3

May 10, 2025 8:06 AM in response to DrSaadLa

the excellent speed of AI tasks on Apple-silicon Macs depends on not using only the CPU and GPU, but also the neural engine, a specialized FAST short floating point array transform processor.

https://en.wikipedia.org/wiki/Neural_Engine

Vast improvements are possible. Here is an quick search example of a user who made a slight change that dramatically improved YOLO performance:

https://pysource.com/2023/03/28/object-detection-with-yolo-v8-on-mac-m1/

Reply

Answer 4

a brody

Level 10

85,382 points

May 10, 2025 8:02 AM in response to DrSaadLa

I would contact the authors of Yolo and advise them of the problem. They need to start getting bug reports in from users and ask on their Apple Developer accounts why this is happening.

A couple things I would also keep in mind:

iCloud, Time Machine, and other auto-syncing software can affect performance. Unless you have a fiberoptic internet connection hard wired you will get lag through the asynchronous network communication happening in the background.

optimizing software do nothing of the sort. They are a placebo, because they dump temporary cache files for the system that can get corrupted. The system when running overnight and restarting the next morning will do its own cleanup better. Etrecheck can identify other memory resident programs you may want to remove if they are unwanted.

Don't let your hard drive get over 85% full. This tested benchmark has been found true of all computers, as a point of diminishing returns. Archive to external media and the Cloud any data you don't need instant access to.

Reply

Answer 5

DrSaadLa Author

Level 1

4 points

May 10, 2025 10:05 AM in response to Grant Bennet-Alder

I have bough the ultimate M3 Max. I would say, I was very satisfied until recently when I had to use pre-trained models for object detection (Yolo series).

The Machine is blazingly fast with deep learning models for time series, image classification and other tasks.

But when It comes to yolo, it is significantly slow. I did a thorough research and tested many solutions (different versions of python, torch, torch-nightly , different parameters such as batch size, n-workers ...) nothing worked.

What disappointed only one model took 24 hours will it took one hour on NVIDIA gpu (I tested that on Kaggle) (Since I have need more memory, I need to train that model on my local machine)

The issue is when using ensemble models or different model architectures, that would take weeks or even months? Is that logical?

After all that, am I missing something out? I asked myself.

With all due respect, some available solutions on other platforms such as the video on YouTube are naively simple.

Reply

Answer 6

etresoft

Level 9

58,014 points

May 10, 2025 1:42 PM in response to DrSaadLa

DrSaadLa wrote:

Please advise on optimizing YOLO training for Apple Silicon. Are there specific configurations or known limitations with MPS implementation for object detection models?

No one here knows anything about YOLO. It is the developer's responsibility to ensure their product works on Apple platforms. Maybe they do that, maybe they don't.

Reply

Answer 7

WheelieNick

Level 8

46,341 points

May 10, 2025 4:24 AM in response to DrSaadLa

Is the YOLO training optimized for Apple silicon or are you using Rosetta when using this?

Reply

Answer 8

DrSaadLa Author

Level 1

4 points

May 10, 2025 5:29 AM in response to WheelieNick

It is optimized for Apple silicon

Reply

Answer 9

DrSaadLa Author

Level 1

4 points

May 10, 2025 10:20 AM in response to a brody

I get to tell you that nothing you mentioned a brody exists in my situation. For the hard drive, I 3.5TB free

Reply

Answer 10

a brody

Level 10

85,382 points

May 10, 2025 2:17 PM in response to DrSaadLa

That's why I point in the developer's route as everyone else does. If nothing I mention could remotely be the cause, then the developer is the only one who would have a clue why it is happening.

Reply

Answer 11

a brody

Level 10

85,382 points

May 10, 2025 2:23 PM in response to DrSaadLa

https://docs.ultralytics.com/

This is where you need to be looking up what Yolo is about. Not on the Apple website.

Reply