How to stream responses from AutoModelforCausalLM?

jayrodge · May 7, 2024, 1:51am

I’m trying to benchmark models with Time to first token as metric, and need to have streaming responses to get the time to generate first token for the models hosted on hugging face.

Seems like AutoModelforCausalLM doesn’t have a flag for ‘stream’, and I tried TGI but it doesn’t support GGUF models.

Is there any way to stream responses using AutoModelforCausalLM or any other way?

Topic		Replies	Views
Streaming token output from models like T5 🤗Transformers	7	12208	June 7, 2023
Text generation. Stream output 🤗Transformers	2	5636	April 4, 2023
Can't stream response token by token Beginners	5	882	September 14, 2024
TGI Model Question 🤗Hub	0	371	September 21, 2023
Streaming partial results from hosted text-generation APIs? 🤗Hub	7	4398	August 18, 2023

How to stream responses from AutoModelforCausalLM?

Related topics