How to stream responses from AutoModelforCausalLM?

I’m trying to benchmark models with Time to first token as metric, and need to have streaming responses to get the time to generate first token for the models hosted on hugging face.

Seems like AutoModelforCausalLM doesn’t have a flag for ‘stream’, and I tried TGI but it doesn’t support GGUF models.

Is there any way to stream responses using AutoModelforCausalLM or any other way?