There is no user control over “VRAM” allocation. And the allocation is dynamic.
Some background: M1 Max doesn’t have the concept of dedicated VRAM. VRAM arises within a GPU and system design that doesn’t have a fast path from the GPU to main memory. In such a design, the GPU needs to keep that data local to the GPU for performance reasons. So it has its own memory. A hardware cache, if you will.
M1 Max has a fast path to main (unified) memory: 400 GBps among GPU and CPU. That’s close to the PlayStation 5 memory bandwidth at 448 GBps, for comparison.
Viewed overly simplistically, the GPU can have as much of the (unified) memory pool as it needs, less what macOS and the non-GPU parts of the active apps need for those non-GPU activities. When passing around the data, the GPU doesn’t need to copy stuff to main memory. It can pass a pointer to the existing “VRAM” data already located in main (unified) memory.
Put differently, you’re going to want to profile and see where the ML app is utilizing both its time and memory.