API: Quota exceeded for machine error

Hello everyone, we are a team based in India whose product uses AI models deployed on Hugging Face.

We’ve been using AWS Sagemaker for quite some time, but due to their autoscaling limitation, we plan to move from AWS SM to Hugging Face. Doing so we faced an issue with Hugging Face API in our backend systems. The point is our backend is responsible for resuming endpoints on HF (Hugging Face) which basically enables client → HF communication & then enables autoscaling to scale down replicas to 0 count when there is no activity for some time.

The problem we are facing when resuming endpoint with API is,

PUT: https://api.endpoints.huggingface.cloud/v2/endpoint/xxx/xxx
{
    "error": "Quota exceeded for g4dn.xlarge. Currently available: 0, requested: 1. Please contact us at api-enterprise@huggingface.co to increase your quota."
}

It seems we’ve somehow exceeded our quota limit which feels like a false alarm as we currently have 5 g4dn.xlarge machine scaled down to 0 & resuming any one of them fails with this error. Since there is no currently running endpoint during this operation, the error message seems irrelevant.

We’ve sent an email to the one above mentioned. But still, if there is something we should be aware of it would be better if anyone from the community would help us here. Thanks in advance!

2 Likes