Steraming Inference without TGI

muhammadfhadli · November 2, 2023, 4:20am

I found this tutorial for using TGI (Text Generation Inference) with the docker image at Text Generation Inference.

However, I’m having trouble using a GPU in a docker container. I was wondering if there is another way to stream the output of the model. I have tried using TextStreamer, but it can only output the result to standard output. In my case, I’m trying to send the stream output to the frontend, similar to how it works in ChatGPT

Topic		Replies	Views
Containerizing transformers with Docker and FastAPI 🤗Transformers	1	2051	August 28, 2020
Containerizing Huggingface Transformers for GPU inference with Docker and FastAPI 🤗Transformers	0	2968	October 5, 2021
Text generation. Stream output 🤗Transformers	2	5628	April 4, 2023
TGI Model Question 🤗Hub	0	371	September 21, 2023
Can't change max_input_length of Text Generation Inference Intermediate	0	136	May 15, 2024

Steraming Inference without TGI

Related topics