Llama-2 significantly slower than other models on huggingface

surya-narayanan · September 8, 2023, 11:09pm

While running the llama and falcon pipelines, I found that Llama-2 is over 30x slower than falcon for the same size (7b).

Is this something that is to be expected?

MaMJ · December 25, 2023, 12:32pm

Did you figure out why? Because I found that Llama-2-7b is much slower and requirs smaller batchsize than OPT-6.7b. They are almost the same size.

surya-narayanan · May 7, 2024, 10:16pm

haven’t figured it out

Topic		Replies	Views
Llama3 so much slow compared to ollama 🤗Transformers	15	9766	February 28, 2025
Llama 2 70B on a cpu Beginners	2	6848	August 23, 2023
Llama 2 10x slower than LLaMA 1 🤗Transformers	1	722	November 7, 2023
Llama2 13b vs 70 b Models	1	451	August 3, 2023
Why is the huggingface generater much slower than the original llama2 generater? 🤗Transformers	0	1309	November 23, 2023