Autoscaling on inference endpoints not initializing from 0 replicas

spacecon · March 6, 2024, 7:30am

Hello!

Have an inference endpoint with Replica scaling to 0 upon 15min of no requests. When it scales to 0, an API call and the “Test your endpoint” section in the frontend results in an “500 Internal Server Error” and the Replica does not scale up to 1. I can see the request in the analytics tab as well.

The endpoint only scales up to 1 if I press this button in the frontend.
here

In this case, the endpoint starts as normal and can take requests from the API like normal.
It might be important to note that I am using a custom handler.py .

I would like the inference endpoint to scale to 1 replica upon an API request.
Any help is highly appreciated!

Korfhage · June 19, 2024, 7:53am

Same problem here, any help would be welcome!

duckduckgrayduck · June 27, 2024, 8:56am

I resolved this by sending an initial probing request, which I expected to get a 503 in response, and then sleep for 60 seconds while it initializes. I then send another request to confirm that it has woken up before sending my actual requests. See my example here: https://github.com/duckduckgrayduck/documentcloud-gumshoe-2-addon/blob/aa3806c3b46b7cf7c9118871e921f14102d75ef5/main.py#L82

Topic		Replies	Views
Inference Endpoint not starting on HTTP request Inference Endpoints on the Hub	2	278	March 6, 2024
Autoscaling is turned on to min replicas as 0. Yet costing money? Inference Endpoints on the Hub	2	509	August 11, 2023
Inference Endpoint not stable Inference Endpoints on the Hub	3	1136	March 18, 2024
Multiple queries at same time to same endpoint Inference Endpoints on the Hub	2	32	February 8, 2025
500 Internal Server Error with Inference Endpoint Inference Endpoints on the Hub	4	2784	June 4, 2024

Autoscaling on inference endpoints not initializing from 0 replicas

Related topics