Is it possible to have streaming responses from inference endpoints?

Volker-ep · July 21, 2023, 2:04am

I assume the answer is no. The inference endpoints seem like short lived connection endpoints only (similar to AWS lambdas).
I understand it’s possible in huggingface spaces, for example via using server side events, but I would prefer to have the scaling capabilities of inference endpoints for my application.

Is it possible?

philschmid · July 21, 2023, 9:18am

See for streaming responses:

If you mean other protocols than HTTP then the answer is no. Deploy LLMs with Hugging Face Inference Endpoints

Volker-ep · July 21, 2023, 4:51pm

Hi Phil. Thanks for the response!

My use case is using a custom handler endpoint (handler.py), is there a way I can set that up to be able to work? I could setup, for example, a SSE streaming response within the handler.py

Any advice appreciated!

Edit: For full context, my application is using llamaindex with a remote call to openai’s chatgpt-4. Chatgpt4 can give streaming output and I would like to forward the generator output using SSE to the client

philschmid · July 24, 2023, 7:20am

custom handler are only supporting traditional HTTP request ↔ response. To add this you would need to create a custom container yourself, that implements the feature.

Volker-ep · July 24, 2023, 4:22pm

Is there a tutorial for this? I haven’t been able to get a custom container to work. It always ends up being stuck at initializing with the logs giving now insights. Does the model repository still need to have a handler.py file?

philschmid · July 24, 2023, 4:49pm

Volker-ep · July 24, 2023, 5:07pm

Hey Phil,

It ended up being an issue on my end with setting the right AWS credentials.
Vielen Dank für deine Hilfe.

Volker

Topic		Replies	Views
Requirements for Hosting LLM via Inference Endpoints Inference Endpoints on the Hub	2	40	June 13, 2025
Help with custom handler.py for model inference endpoint Beginners	1	732	February 24, 2024
Inference Endpoints - No working code examples Inference Endpoints on the Hub	3	152	January 29, 2025
Guide/Tutorial to write an inference endpoint for custom models Inference Endpoints on the Hub	5	1730	October 19, 2024
Creating inference endpoint with custom handler - is this how it should work? Beginners	5	2311	November 27, 2022

Is it possible to have streaming responses from inference endpoints?

Related topics