What is the Quota on ZeroGPU for PRO users?

I am a PRO user and I have a space running on the ZeroGPU. It is pretty neat, but after a few inferences I get an error that says Quota is Exceeded and some numbers like: Requested 42s and 60s…

3 questions please:

  1. What is the quota and how is it calculated?
  2. How do I interpret the numbers/seconds i.e “requested 42s on 60s”?
  3. What is the path towards getting to a point with no quota limitations like this?

I read another post like: GPU quota exceeded even when using access token from PRO but it is not very helpful to answer my specific questions.

5 Likes

Hi @amaraa

The quota on ZeroGPU for PRO users limits GPU usage to a specified time per session. ‘Requested 42s on 60s’ means your session requested 42 seconds of GPU time but was limited to 60 seconds.

@LLUMOAI - Can you please elaborate?

I am still not sure I understand what the quota is based on your answer. Usually, when service providers talk about quota they publish things like: “60 requests per minute” or “3GB per hour”…etc. What is the quota of GPU time here?

And “Requested 42s on 60s” → if it means the session requested 42 seconds of GPU time and the limit was 60 seconds doesn’t it mean there is enough? The way I understand it is that there is 60 seconds available and my session requested 42sec?

Also, I noticed that when this error happens then the error also shows that I should try again in like 8 minutes and at times even 35 minutes or 40 minutes. So, I am very confused as to how this is calculating.

3 Likes

Hi @amaraa

  1. Quota: GPU quota can be complex, often tied to overall resource availability. Check your service plan details for specifics.
  2. “Requested 42s on 60s”: It means you requested 42 seconds within a 60-second window, and you were within limits. Issues may be due to the overall resource load.
  3. Retry Times: Suggested wait times reflect system load and resource allocation.

For detailed info try reaching out to Hugging Face support.

1 Like

These are not answers to the questions…

  1. What service plan are you talking about?
  2. Clearly this is an error being triggering as a result of the system believing we’re outside of the limit.
  3. How are the numbers calculated?

My situation: Despite not submitting an inference request on any Space in many hours, my first attempt to do so this evening yields: You have exceeded your GPU quota (58s left vs. 60s requested)

In the last 24 hours, I’ve submitted two/three requests (approximately 3-4 hours ago) under another user’s Space relating to auto-captioning a very small number of individual image uploads.

3 Likes

We have the same kind of issue, both on a public and private space in the same organization.

We get things like this in the logs:

gradio.exceptions.Error: ‘No GPU is currently available for you after 60s’

and

gradio.exceptions.Error: ‘GPU task aborted’

and

gradio.exceptions.Error: ‘You have exceeded your GPU quota (33s left vs. 60s requested). Sign-up on Hugging Face to get more quotas or retry in 0:46:09’

The latter can even happens when we are signed into an account and use a private space.

2 Likes

You have exceeded your GPU quota (27s left vs. 180s requested). Please retry in 2 days, 0:00:00
This is what i am getting after sending one inference request…

I paid for PRO account assuming i get 5x more zeroGPU resources. I had tried to run 2 inferences in the last 2 days and now one more today and I can’t do another one. Could we please explicitly explain how this is allocated or show somehow this limit somewhere? Otherwise I don’t see any point in paying for PRO account if i can’t run couple of test requests.

4 Likes

You have to sign in to the space, not log in, or Quota mitigation won’t work.
If someone else’s space doesn’t have a sign-in button, you’ll have to copy the space yourself and add your own sign-in button.
It is unclear if this is temporary and will eventually be fixed, or if it will stay that way.

1 Like

I understand the problem, and even we paid for PRO we just cant have unlimited quota. What I wonder is (in my own spaces) can use my apps with ZeroGPU unlimitedly but if I use it via API (with gradio client pointing to my app), I assume that it is because it didn’t recognize me, but how can I tell my app (from code) that it’s me, it seemd that using my hf_token doesn’t work. Or maybe there is no solution and it is the way it’s built. I’m asking you because you propose to use “sign in with space” as quota mitigation, but what or how is mitigating? Thanks in advance!

1 Like

but how can I tell my app (from code) that it’s me, it seemd that using my hf_token doesn’t work.

It’s probably a bug. An issue is in progress.

1 Like

Thanks for your answer, this numbers are right, I have tested the 3 cases and those are the numbers:
NOT LOGIN: the quota is about 180s
LOGIN: the quota is 300s
PRO USER: the quota is 1500s…

My question is why my quota is unlimited if I use it in the web page there at my hugging face space. But limited (even its 1500s but still limited) when I use gradio client in python.

I mean, both should be the same don’t they?

1 Like

Well, it’s probably a bug. If your login status is not recognized by HF, you will be treated as a guest, so the quota will be calculated for each IP. If you use HF from your mobile phone while you are on the move, for example, your IP will change frequently.

In such cases, it may be virtually infinite.

1 Like

You are right, actually quota is calculated by IP. That is the problem, it seems that then it doesn’t help me being PRO user. I’m no longer using HF login so that’s not the point. Here I’m just testing to ways (I’m PRO user).

1.- Manually in my hf space web page. (that has no quota, I have been able to use it too many times without being limited).
2.- Programatically with python using the gradio_client. (which limit me to just 1500s).

The question is, why quota is different from method 1 to method 2? Ok, maybe because I’m PRO user and the web hf space detects it. But then, how I¿it’s supposed to detect that I’m a PRO User programatically, with python?

2 Likes

From the site description, it seems that 2 is working correctly and 1 is a bug, but the question is what cause 1’s bug and whether it’s a bug that needs to be fixed.

But then, how I¿it’s supposed to detect that I’m a PRO User programatically, with python?

I’m not too familiar with authentication in general either, but I think it’s managed by cookies or sessions, except when you press the sign-in button (OAuth). Anyway, I think it’s just that it’s always communicating with the server and issuing quota permissions. It’s just in real time but similar system we use for forums and HF Hub. I don’t know what language the HF server-side program is written in.

Explanation here on how to get past the quota limit for pro users using the gradle api: Usage quota exceeded - #7 by frostbyte07

2 Likes