Unable to start inference endpoint: not enough hardware capacity

I have an inference endpoint that I have been using for quite some time now. I normally let it scale to 0 when not using it. However today I am unable to get it to start at all. I keep getting the following error:

[Server message]Endpoint failed to start. Scheduling failure: not enough hardware capacity

The model I am running is meta-llama/Llama-2-13b-chat-hf . Running it no a: GPU [medium] · 1x Nvidia A10G

I’ve tried reaching out to HuggingFace to see if it is possible to reserve a GPU instance or something because I really need it right now for launching a product.

Does anyone know of any other way to get something like this up and running or alternatively how long these periods of no available hardware can last?

Hello @theneck,

We had a minor issue this morinig in eu-west-1. This should be fixed.

1 Like