Hello!
Have an inference endpoint with Replica scaling to 0 upon 15min of no requests. When it scales to 0, an API call and the “Test your endpoint” section in the frontend results in an “500 Internal Server Error” and the Replica does not scale up to 1. I can see the request in the analytics tab as well.
The endpoint only scales up to 1 if I press this button in the frontend.
In this case, the endpoint starts as normal and can take requests from the API like normal.
It might be important to note that I am using a custom handler.py .
I would like the inference endpoint to scale to 1 replica upon an API request.
Any help is highly appreciated!