Autoscaling on inference endpoints not initializing from 0 replicas

Hello!

Have an inference endpoint with Replica scaling to 0 upon 15min of no requests. When it scales to 0, an API call and the “Test your endpoint” section in the frontend results in an “500 Internal Server Error” and the Replica does not scale up to 1. I can see the request in the analytics tab as well.

The endpoint only scales up to 1 if I press this button in the frontend.
here

In this case, the endpoint starts as normal and can take requests from the API like normal.
It might be important to note that I am using a custom handler.py .

I would like the inference endpoint to scale to 1 replica upon an API request.
Any help is highly appreciated!

1 Like

Same problem here, any help would be welcome!

I resolved this by sending an initial probing request, which I expected to get a 503 in response, and then sleep for 60 seconds while it initializes. I then send another request to confirm that it has woken up before sending my actual requests. See my example here: https://github.com/duckduckgrayduck/documentcloud-gumshoe-2-addon/blob/aa3806c3b46b7cf7c9118871e921f14102d75ef5/main.py#L82