Running LLMs on M4 MAX, more GPU resources, future plans

Question

Level 2

204 points

Running LLMs on M4 MAX, more GPU resources, future plans

I'm able to run some LLMs on my Mac M4 MAX, though they are noticeably slow due to the need for more resources. On the Intel CPUs, we were able to expand GPU resources externally. The Apple Silicon design, as I understand, does not permit this.

Since running LLMs is becoming commonplace, I wonder what other approaches we might be able to use on this platform to expand those resources, or if that will be possible.

MacBook Pro 16″, macOS 15.3

Posted on Mar 18, 2025 9:23 AM

Reply

Answer 1

etresoft

Level 9

57,590 points

Mar 18, 2025 2:00 PM in response to Forrest

Forrest wrote:

Specifically, I am referring to the ability to expand the systems' GPU complement in an external chassis. There are products that do this -- I forget the proper name for it.

I know what you mean. You're talking about eGPU.

The resources needed to run a competent LLM are significant (can be).

This is true.

I read elsewhere that this apparent limitation is in the software of the OS, that it is technically possible -- but Apple does not yet facilitate or allow it.

Now it gets more interesting. Where are you reading these things? The internet isn't true you know.

Apple no longer supports eGPU. Modern GPUs in Apple Silicon are much more powerful than any of those old eGPUs ever were. Plus, modern Apple Silicon processors have components dedicated to AI processing. This is on top of the existing GPU capabilities.

Reply

Answer 2

Alancito

Level 7

30,432 points

Mar 18, 2025 9:49 AM in response to Forrest

Forrest wrote: "...expand GPU resources externally. The Apple Silicon design, as I understand, does not permit this..."

Forrest ~ That appears to be the case. So alternatively, consider that "The Apple Mac Studio featuring the M3 Ultra represents the most powerful AI workstation currently available" — see HERE.

õ¿õ¬

Reply

Answer 3

etresoft

Level 9

57,590 points

Mar 18, 2025 11:01 AM in response to Forrest

Forrest wrote:

I'm able to run some LLMs on my Mac M4 MAX, though they are noticeably slow due to the need for more resources. On the Intel CPUs, we were able to expand GPU resources externally. The Apple Silicon design, as I understand, does not permit this.

It's not that Apple Silicon doesn't permit this, Apple Silicon was designed specifically to do this. Intel CPUs were not designed for this kind of work. That meant you needed 3rd party add-ons to do it. You don't need anything else with Apple Silicon.

Maybe expand on that "noticeably slow". Have you tried the same model on an Intel CPU? Did it run faster or slower?

Since running LLMs is becoming commonplace, I wonder what other approaches we might be able to use on this platform to expand those resources, or if that will be possible.

What do you mean by "running LLMs"? Apple products are designed to perform some degree of AI work, and do it better than any other platform. But it's still just a single device. It can't compete with an entire data centre.

Reply

Answer 4

Forrest Author

Level 2

204 points

Mar 18, 2025 11:07 AM in response to etresoft

Specifically, I am referring to the ability to expand the systems' GPU complement in an external chassis. There are products that do this -- I forget the proper name for it. The resources needed to run a competent LLM are significant (can be). I read elsewhere that this apparent limitation is in the software of the OS, that it is technically possible -- but Apple does not yet facilitate or allow it. Correct me if that is wrong.

Reply