Unable to start inference endpoint: not enough hardware capacity

I have an inference endpoint that I have been using for quite some time now. I normally let it scale to 0 when not using it. However today I am unable to get it to start at all. I keep getting the following error:

[Server message]Endpoint failed to start. Scheduling failure: not enough hardware capacity

The model I am running is meta-llama/Llama-2-13b-chat-hf . Running it no a: GPU [medium] · 1x Nvidia A10G

I’ve tried reaching out to HuggingFace to see if it is possible to reserve a GPU instance or something because I really need it right now for launching a product.

Does anyone know of any other way to get something like this up and running or alternatively how long these periods of no available hardware can last?

Hello @theneck,

We had a minor issue this morinig in eu-west-1. This should be fixed.

1 Like

Hi, I’m experiencing the same issue for the past several hours: eu-west-1: GPU · Nvidia A10G · 1x GPU · 24 GB.

Server message: Endpoint failed to start. Scheduling failure: not enough hardware capacity.

My endpoint has been running for the past 2 months without this issue. Could you please advise?

I’m having this same issue, and on that same server eu-west-1…

According to technical support; this issue should be resolved. But I’m still experiencing the same error on my endpoint. I’m still having further conversations with the support team, hopefully its resolved on your end?

Nope, same here - support says it should work, but still it isn’t turning back on.

1 Like

this seems to happen occasionally for us. Any helpful way to see any additional logs or workaround this?

Server message:Endpoint failed to start. Scheduling failure: not enough hardware capacity