Inference is slow on M1 Mac despite MPS Torch backend

astrojuanlu · January 31, 2024, 8:42am

I’ve been trying to run a simple phi-2 example on my M1 MacBook Pro:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("mps")  # <-------- MPS backend

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

(this was after installing both the stable and nightly versions of PyTorch with pip and conda/micromamba on Python 3.11)

However, inference still takes a good 40 seconds (only the model.generate(...) line).

(After reading MPS device appears much slower than CPU on M1 Mac Pro · Issue #77799 · pytorch/pytorch · GitHub, I made the same test with a cpu model and MPS is definitely faster than CPU, so at least no weird stuff going on)

On the other hand, using MLX and the mlx-lm library makes inference almost instantaneous.

Is this expected? Am I doing anything wrong?

vt12 · May 4, 2024, 3:48pm

same boat, super slow and eats a lot of RAM in the process.

vectornaut · May 17, 2024, 2:41pm

Did anyone figure out the right way to use an MPS accelerator?

Yingding · May 21, 2024, 11:02pm

It is even slower with Phi-3-medium-128k-instruct. Not getting any result after 40min.

Masta1337 · May 26, 2024, 11:56pm

In the Phi-3 docs they mention that its only meant to run on certain CPUs (if non-GPU mode) and apple silicon (M1/MX) is not supported. I’m finding many microsoft features like ONNX have poor M1 support. Go figure they dont want to support competitor chips

Topic		Replies	Views
Performance of mtb-7b on mac M1 Beginners	0	1268	January 3, 2024
Attempt to generate Text, but its to slow Beginners	0	156	July 25, 2024
Models slow on M1 Pro 16gb Beginners	0	729	December 18, 2023
MPS is running slower than CPU on Mac M1 Pro 🧨 Diffusers	6	6947	November 1, 2022
CPU faster than MacBook GPU for Summarization 🤗Transformers	0	65	September 4, 2024

Inference is slow on M1 Mac despite MPS Torch backend

Related topics