MPS is running slower than CPU on Mac M1 Pro

Hello everyone.

I have been recently testing the new version 0.3.0 on my M1 Pro but I found that following the steps from How to use Stable Diffusion in Apple Silicon (M1/M2) the execution times for CPU and MPS are on average for similar prompts:

  • GPU: 331 s
  • CPU: 222 s

Has anyone tested it too ?

Hi @polodealvarado! Your CPU numbers are very similar to the ones I get in my M1 Max, but as reported in the page you mentioned, the speed I see is much faster when using the GPU. Would you mind sharing a couple of details so I can try to take a look? These would be useful:

  • The amount of RAM your computer has.
  • The version of PyTorch you installed.
  • Your macOS version.
  • A small code snippet, only if you made any changes to the example we provided.

Thanks a lot!

HI! @pcuenq, thank you for answering.

Here you have all the details and more:

  • RAM: 16 GB
  • GPU cores: 16
  • macOS version: 12.5.1
  • Python version: 3.9.13
  • Diffuser version: 0.3.0
  • Torch version: 1.13.0.dev20220908

I have been using the same code without touching it. On the other hand, I tried another jupyter notebook from this repository and the results are quite similar (cpu works better than mps).

1 Like

I am following this thread, running mps backend. @pcuenq

1 Like

That’s a very interesting thread! They specifically say that random operations are not yet optimized; however, diffusers’ code generates random latents in CPU when using the mps device.

I’ll do some testing, thanks!

1 Like

This also happens to me guys… my CPU takes around 4m 30s, my GPU (mps) takes more than 20 minutes??
Same code, I was simply changing:

pipe ="mps")


pipe ="cpu")
  • RAM: 16 GB
  • GPU cores: 16
  • macOS version: 12.6
  • Python version: 3.10.4
  • Diffuser version: 0.6.0
  • Torch version: 1.14.0.dev20221031

We are going to release a new version of diffusers this week optimized for PyTorch 1.13, which was released last Saturday.

In the meantime, TL;DR:

  • Install production version of PyTorch, not the nightly one. You should get version 1.13.0.
  • Use the main branch of diffusers instead of the one from PyPi (pip install git+
  • Use attention slicing to optimize memory usage and prevent swapping (pipe.enable_attention_slicing() after you create your Stable Diffusion pipeline).
1 Like