Streaming output text when deploying on Sagemaker

I’m working on fine-tuning bloom and deploying the model on Sagemaker and I wanted to know if it’s possible to stream the output generated text, with directly modifying the inference functions?
I already tried with applying the TextIteratorStreamer within predict_fn function, but this doesn’t seem to be the solution.
Any idea about this would be really appreciate :blush:

1 Like

SageMaker is currently not supporting streaming responses.

1 Like

Ok thank you @philschmid

Hi @Ludivine , hi @philschmid ,

I’m also trying to stream in a GPT like style the output of my LLM. I’m trying to use https request and an iterator streamer. However, it would be great if it was a built-in possibility with huggingface/sagemaker.

Thanks for your great job.

Hey @RemiP thanks for your response. Can you pls elaborate how are you streaming outputs from the LLM deployed as HuggingFace inference endpoint? Appreciate you help:)

Sagemaker real-time inference endpoints now supports streaming. Check this blog