I’m working on fine-tuning bloom and deploying the model on Sagemaker and I wanted to know if it’s possible to stream the output generated text, with directly modifying the inference functions?
I already tried with applying the TextIteratorStreamer within predict_fn function, but this doesn’t seem to be the solution.
Any idea about this would be really appreciate
SageMaker is currently not supporting streaming responses.
Ok thank you @philschmid
I’m also trying to stream in a GPT like style the output of my LLM. I’m trying to use https request and an iterator streamer. However, it would be great if it was a built-in possibility with huggingface/sagemaker.
Thanks for your great job.
Hey @RemiP thanks for your response. Can you pls elaborate how are you streaming outputs from the LLM deployed as HuggingFace inference endpoint? Appreciate you help:)
Sagemaker real-time inference endpoints now supports streaming. Check this blog