Hi,
I’m working on fine-tuning bloom and deploying the model on Sagemaker and I wanted to know if it’s possible to stream the output generated text, with directly modifying the inference functions?
I already tried with applying the TextIteratorStreamer within predict_fn function, but this doesn’t seem to be the solution.
Any idea about this would be really appreciate
SageMaker is currently not supporting streaming responses.
Hi @Ludivine , hi @philschmid ,
I’m also trying to stream in a GPT like style the output of my LLM. I’m trying to use https request and an iterator streamer. However, it would be great if it was a built-in possibility with huggingface/sagemaker.
Thanks for your great job.