I assume the answer is no. The inference endpoints seem like short lived connection endpoints only (similar to AWS lambdas). I understand it’s possible in huggingface spaces, for example via using server side events, but I would prefer to have the scaling capabilities of inference endpoints for my …

Is it possible to have streaming responses from inference endpoints?

philschmid July 24, 2023, 4:49pm 6

Topic		Replies	Views
Help with custom handler.py for model inference endpoint Beginners	1	753	February 24, 2024
Getting "No worker is available to serve request: model" with HuggingFaceModel endpoint Amazon SageMaker	13	5092	March 22, 2022
Handler.py not executed in Inference Endpoint Inference Endpoints on the Hub	0	265	September 13, 2023
Adapting a model from Spaces to Inference Endpoint Inference Endpoints on the Hub	3	2021	November 25, 2022
Streaming output text when deploying on Sagemaker Amazon SageMaker	5	2481	October 6, 2023