If you are doing machine learning or modeling, the MAX processor features mainly more CPUs and GPUs. Each has the same 16-core Neural Engine, a really fast Array Transform Processor for short Floating Point numbers. Exactly what is needed for Machine Learning and Modeling.
--------
When you write you in regular Python, you are using CPU computation. One instruction operates on one operand at a time. Every simple computation suffers the multiple-instruction overhead of a loop around it.
Lots of machine learning on all platforms uses the numerical capabilities of the GPU processor. GPU generally has an Array Transform Processor inside (single operation, MULTIPLE data) that greatly speeds up processing time and supports floating point calculation on a large amount of data at high speed. It was invented that way so that you could adjust an entire screen-buffer with one operation.
side note: NVIDIA is currently the darling of the AI business because of the Numeric capabilities of its Graphics processors, and the meta-language (CUDA) surrounding them that make for slightly easier access. But they do not have any real magical capabilities!
Apple's Neural engine capabilities built into its Apple-silicon chips are very fast and efficient, and should not be ignored. Neural Engine provides EXACTLY what is needed for fast Machine Learning types of computation. But you need to deliberately choose to use it!
If you want modeling performance, you need to look into Apple APIs and Tools for Machine Learning, and use their APIs that take advantage, first of the Neural engine, then optimized to take advantage of ALL available numeric capabilities: Neural Engine, GPU numerics, and CPU numerics.
The tools are there, but it takes some work on your part to read about them and reference them.