API: Quota exceeded for machine error

jayparmr · June 22, 2023, 4:57am

Hello everyone, we are a team based in India whose product uses AI models deployed on Hugging Face.

We’ve been using AWS Sagemaker for quite some time, but due to their autoscaling limitation, we plan to move from AWS SM to Hugging Face. Doing so we faced an issue with Hugging Face API in our backend systems. The point is our backend is responsible for resuming endpoints on HF (Hugging Face) which basically enables client → HF communication & then enables autoscaling to scale down replicas to 0 count when there is no activity for some time.

The problem we are facing when resuming endpoint with API is,

PUT: https://api.endpoints.huggingface.cloud/v2/endpoint/xxx/xxx
{
    "error": "Quota exceeded for g4dn.xlarge. Currently available: 0, requested: 1. Please contact us at api-enterprise@huggingface.co to increase your quota."
}

It seems we’ve somehow exceeded our quota limit which feels like a false alarm as we currently have 5 g4dn.xlarge machine scaled down to 0 & resuming any one of them fails with this error. Since there is no currently running endpoint during this operation, the error message seems irrelevant.

We’ve sent an email to the one above mentioned. But still, if there is something we should be aware of it would be better if anyone from the community would help us here. Thanks in advance!

Topic		Replies	Views
Increase quota for Inference Endpoint Inference Endpoints on the Hub	4	177	January 31, 2025
Quota exceed error Beginners	13	100	April 18, 2025
Hugging face inference support and quota Inference Endpoints on the Hub	3	112	March 7, 2025
"Bad Request: Your endpoint is in error, check its status on endpoints.huggingface.co Models	4	180	June 16, 2025
Need help with hugging face API endpoint. ModelError: code "400" Beginners	0	336	March 3, 2024

API: Quota exceeded for machine error

Related topics