Exceeded GPU quota via API , but fine interactively

I’ve a pro account and I have created a space using Gradle, and GPU. It works fine interactively, but when I connect to my space via Gradle API then, after a few requests, I get:
You have exceeded your GPU quota (60s requested vs. 58s left).
Create a free account to get more usage quota.
Can I somehow pass my credentials via the API in order to resolve this? I need to use an API to test the space I’ve created.
Thanks in advance.

2 Likes

Update: I set the space to private and used the hf_token to connect via Gradle API. Works for about 3 API calls and then then I get the exact same error.
Why is is asking me to create a free account when I am using my access token?

3 Likes

Ehh…
That Gradio error has been happening to me for months now, even when I use it interactively while logged into my Pro account…

I think the log in part is simply not working.
I think it’s just written, but I don’t think it’s actually working.

P.S.

You can report any bugs or specification oddities in Zero GPU to the Discussion section of this group.

My report as an example.

Thanks a mill for that. Login does work; I just got about 20 good API requests before it started to fail. Its still working interactively so it looks like there are quotas on the API, and these are not documented anywhere as far as I can see. Looks like I need to go elsewhere for a solution like Runpod or Beam or another crew that offer GPUs on usage basis, as opposed to a per hour basis. The A100 would cost US$3000 per months here. I’m surprised HuggingFace don’t offer access on a usage basis; after all they do know how to share GPUs - that’s what spaces is all about!

4 Likes

After hours of googling, I have to say that the Huggingface articles on how to host on Runpod are absolutely awful. This was posted 3 weeks ago but there is no Step 2, it doesn’t exist. These guys waste so much of our time:

2 Likes

but there is no Step 2, it doesn’t exist

Oh, come on, you’re kidding.:sweat_smile:
But it’s rather common at HF. I don’t know if it was at the time they wrote the article or if they wrote it because it’s theoretically possible in an ideal state…
But it’s a little unusual for them to lie in the article, let alone in README.md, which has a lot of copy and paste.

I’m surprised HuggingFace don’t offer access on a usage basis; after all they do know how to share GPUs - that’s what spaces is all about!

I think it’s by the hour, not by the amount of usage, since we all share the same GPU at HF. I understand that’s the only way to do it, especially with the Zero GPU structure.
But I can very well understand that a billing plan on a per usage basis would be helpful.
Because now you are charged even while the system is down due to errors. (I saw someone complaining about it on the forum)

I think it would be good to submit a request if possible. Well, it won’t happen right now, but adding more plans is technically just writing a few lines. From a management standpoint, though, it would require a meeting or something.

Request Form:

1 Like

I posted a follow up request in the post:

Please provide more transparency to PRO account ZeroGPU quota limit.

My spaces run fine using the interface, but they won’t run through API at all. Does that mean there are two different pools of quota for API & interface?

The error message was: The upstream Gradio app has raised an exception: You have exceeded your GPU quota (60s requested vs. 52s left). Create a free account to get more usage quota.

Even if I wait myself for a few days, the GPU quota available to me is still lower than 60s. Is there something wrong behind the stage?

For now, I am forced to convert all my Gradio API repo into the front end repo, which makes it gigantic and slow to rebuild and debug.

and now I have consolidated all my api spaces to one single space in front end… but then the new problem is:

Traceback (most recent call last):
File “/home/user/app/app.py”, line 399, in
demo.queue(max_size=16).launch(
File “/usr/local/lib/python3.10/site-packages/spaces/zero/gradio.py”, line 142, in launch
task(*task_args, **task_kwargs)
File “/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py”, line 348, in pack
_pack(Config.zerogpu_offload_dir)
File “/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py”, line 340, in _pack
pack = pack_tensors(originals, fakes, offload_dir, callback=update)
File “/usr/local/lib/python3.10/site-packages/spaces/zero/torch/packing.py”, line 114, in pack_tensors
os.posix_fallocate(fd, 0, total_asize)
OSError: [Errno 28] No space left on device

Please make the quota limit shared between interactive usage & api usage under a same account!

3 Likes

I upvoted it.

1 Like

I describe a working example here: Usage quota exceeded - #7 by frostbyte07

1 Like

Thanks, I follwed it but it’s not useful. You are explaining how to manage spaces processing time via @spaces.GPU(duration=75) but that is the amount of seconds the app asks for processing. And also told us how to be able to use a site with the HF_TOKEN: client = CLIENT(“your_duplicated_space/FLUX.1-dev”, hf_token=os.getenv(“HF_TOKEN”))

BUT none of those answers makes us apply our PRO QUOTA, programatically it still allows us to only use 300 seconds not our correspondant 1500 seconds :frowning:

2 Likes