Llama 3 Instruct taking too long all of a sudden

FireBallMC · June 9, 2024, 2:52pm

I was running Llama 3 instruct on my cpu using transform getting reasonable generation times of around 1-2 minutes. After trying to update cuda to run it on my cpu it now takes arround 40-50 minutes when running on cpu even though i didnt change anything about the original environment. Ive even tried reinstalling the model and all libraries in a new env. Anybody have any idea what could cause this?

FireBallMC · June 9, 2024, 2:53pm

Heres my code:

import math
import transformers
import torch
import time

ts = time.time()
lt = time.localtime()

print(f"Loading Model {lt.tm_hour}:{lt.tm_min}:{lt.tm_sec}")

device = "cpu"

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16,
                  "low_cpu_mem_usage": True,
                  },
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "What be the best way to find buried treasure?"},
]

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

lt = time.localtime()
print(f"Model loaded in {math.floor((time.time() - ts)/60):02d}:{round((time.time() - ts)%60):02d} seconds at {lt.tm_hour}:{lt.tm_min}:{lt.tm_sec}")

ts = time.time()
print(f"Generating response {lt.tm_hour}:{lt.tm_min}:{lt.tm_sec}")

outputs = pipeline(
    messages,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])

lt = time.localtime()
print(f"Time taken {math.floor((time.time() - ts)/60):02d}:{round((time.time() - ts)%60):02d} seconds at {lt.tm_hour}:{lt.tm_min}:{lt.tm_sec}")

Topic		Replies	Views
Llama 3 performance is 4 mins. can get it in seconds? Models	2	495	March 24, 2025
Llama 2 10x slower than LLaMA 1 🤗Transformers	1	724	November 7, 2023
Llama3 so much slow compared to ollama 🤗Transformers	15	9995	February 28, 2025
Llama 70b model not using GPU Models	0	1108	September 13, 2023
Models slow on M1 Pro 16gb Beginners	0	729	December 18, 2023

Llama 3 Instruct taking too long all of a sudden

Related topics