Models slow on M1 Pro 16gb

zzif · December 18, 2023, 2:01am

Hi All, I am enjoying the transformers library but unfortunately it is very slow. It takes 15-30 minutes for a small request with llama7b. In comparison, this takes a few seconds when I run it with ollama.ai. My setup is just using the pipeline method, has anyone else seen issues? My setup is:

conda create --name pytorch39 python=3.9
conda activate pytorch39
conda install -c huggingface transformers 
conda install pytorch-nightly::pytorch torchvision torchaudio -c pytorch-nightly
conda install -c anaconda pillow libtiff
conda install -c conda-forge accelerate einops
huggingface-cli login

from transformers import pipeline
from transformers import AutoTokenizer
import transformers
import torch
transformers.logging.set_verbosity_debug()

model = "meta-llama/Llama-2-7b-chat-hf" # meta-llama/Llama-2-7b-hf
llama_pipeline = pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained(model, use_auth_token=True)

prompt = 'Can you explain why grass is green?'
sequences = llama_pipeline(
    prompt,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=256,
)
print("Chatbot:", sequences[0]['generated_text'])

Topic		Replies	Views
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1493	June 23, 2024
Llama 2 10x slower than LLaMA 1 🤗Transformers	1	724	November 7, 2023
Why the model loading of llama2 is so slow? 🤗Transformers	6	9458	April 24, 2024
Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes 🤗Transformers	22	49575	December 19, 2024
Attempt to generate Text, but its to slow Beginners	0	154	July 25, 2024

Models slow on M1 Pro 16gb

Related topics