Issue: YOLO Model Training Extremely Slow on M3 Max
System
- MacBook Pro with M3 Max (40-core GPU, 40GB GPU RAM)
- 128GB unified memory
- macOS Sequoia 15.3.2
Problem
Despite high-end hardware, YOLO model training is unexpectedly slow with poor GPU utilization. The same models train significantly faster on NVIDIA hardware.
Attempted Solutions
- Latest Apple Silicon-optimized ML frameworks
- Various batch sizes (4-32)
- Both MPS and CPU backends
- Native and Rosetta environments
Request
Please advise on optimizing YOLO training for Apple Silicon. Are there specific configurations or known limitations with MPS implementation for object detection models?
This high-end machine was purchased specifically for ML development, but current performance makes it impractical for YOLO training and other Deep Learning frameworks specified for image processing.
Thanks