Why is the huggingface generater much slower than the original llama2 generater?

Harryis · November 23, 2023, 6:38am

When I use the same prompts, I find that huggingface generater:AutoModelForCausalLM.from_pretrained.generate() is much slower than the original llama2 generater:LLaMA.generate() about six times. I wonder why is it, and how to speed up?

Topic		Replies	Views
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1495	June 23, 2024
Using model.generate() in parrellel / faster? Beginners	0	361	October 11, 2023
Llama 2 10x slower than LLaMA 1 🤗Transformers	1	724	November 7, 2023
Llama3 so much slow compared to ollama 🤗Transformers	15	10008	February 28, 2025
Models slow on M1 Pro 16gb Beginners	0	729	December 18, 2023

Why is the huggingface generater much slower than the original llama2 generater?

Related topics