I have an inference endpoint that I have been using for quite some time now. I normally let it scale to 0 when not using it. However today I am unable to get it to start at all. I keep getting the following error:
[Server message]Endpoint failed to start. Scheduling failure: not enough hardware capacity
I’ve tried reaching out to HuggingFace to see if it is possible to reserve a GPU instance or something because I really need it right now for launching a product.
Does anyone know of any other way to get something like this up and running or alternatively how long these periods of no available hardware can last?
According to technical support; this issue should be resolved. But I’m still experiencing the same error on my endpoint. I’m still having further conversations with the support team, hopefully its resolved on your end?