Inference Endpoint not stable

andersgb1 · March 13, 2024, 8:19am

Hi. I have a model running on an AWS T4 instance. I have scale-to-zero set to never and autoscaling to 2, and then I was expecting to be able to provide a service that was up 24/7, but sadly that is not the case. From time to time, the instance just dies and I get HTTP 503 responses (service unavailable). The only solution is to manually restart the instance.

This is of course a deal breaker for someone who wants to run a stable service.

Has anyone else experienced it?

8thcross · March 16, 2024, 7:00pm

could it be from errors running the model? i have experienced this with oobabooga crashing if you try to load models it did not like - such as the ones using multimodal. if its a single model inferencing endpoint, i would check logs…in any case, t series are pretty low powered…i think you should try g4 ro g5 to start with, also - t series does not have any GPUs…some of the models require it.

Synacknetwork · March 17, 2024, 10:41am

Obviously instance stability despite the setup, have you checked logs or can you post some logs we can help you with? Also I always like to setup some automated monitoring so when shit hits the fan it tells me what’s going on and pinpoints it…I want to say it HAS to be CPu gpu memory something with the error your giving

filiptibell · March 18, 2024, 1:51pm

Having a similar problem. The endpoint mostly works fine - but several times daily we encounter this issue.

Our instance is configured as:

AWS eu-west-1
Intel Ice Lake 8 vCPU (no GPU)

Some additional details:

The 503 Service Not Available responses last for about ~10 seconds and then it recovers.
The endpoint shows as active while erroring, and there are no errors in the logs.
We are using a custom docker image that works fine on any other hosting service.

Would appreciate if this was fixed - HF Inference Endpoints are very developer friendly and we would like to keep using them.

EDIT: Here is an analytics overview of a completely fresh instance and some requests. The upticks in 503 status errors should be pretty obvious.

Topic		Replies	Views
Autoscaling on inference endpoints not initializing from 0 replicas Inference Endpoints on the Hub	2	403	June 27, 2024
Inference Endpoint not starting on HTTP request Inference Endpoints on the Hub	2	278	March 6, 2024
Autoscaling is turned on to min replicas as 0. Yet costing money? Inference Endpoints on the Hub	2	508	August 11, 2023
Inference endpoint "failed" and then "deleted" Inference Endpoints on the Hub	1	409	March 8, 2024
Dedicated endpoint stuck at Initializing Inference Endpoints on the Hub	4	280	July 8, 2024

Inference Endpoint not stable

Related topics