Conversational pipeline by huggingface transformer taking too long to generate output

harshraj · September 27, 2023, 2:52pm

I have fine-tuned LLAMA-2 7B Chat model and i am using huggingface transformer pipeline for conversation task. The dataset used for fine tuning is openassist user assistant chat dataset. When i am invoking chatbot to generate the completion for conversational prompt, it is taking forever to generate completion. Please help me understand what is going wrong. I have used 4-bit qunatization so that i can fine tune and inference on the google colab free instance available to me.

Topic		Replies	Views
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1491	June 23, 2024
Models slow on M1 Pro 16gb Beginners	0	729	December 18, 2023
Help with starting to write a Casual Chatbot AI Beginners	5	1934	November 9, 2024
Llama 3 70b in the Chat UI Is Super Slow and Nearly Unusable Beginners	2	696	October 4, 2024
Why is the huggingface generater much slower than the original llama2 generater? 🤗Transformers	0	1327	November 23, 2023

Conversational pipeline by huggingface transformer taking too long to generate output

Related topics