Zero GPU Worker Error

[Did perform a search. The only other thread discussing a similar topic was deleted by the user so here goes.]

For the last few days while using ZeroGPU to generate content, I was getting quite a lot of “ZeroGPU worker error RuntimeError”-s. It was eating up a lot of my daily 25 minutes generation quota, but still, with intermittent retries I could at least finish the generation. However, since today, the Zero GPU Worker Errors have almost become instantanous on clicking the generate button, making it impossible to generate anything at all.

Can anyone tell me if they are/were facing similar issues? And if there is any workaround to the issue?

I was mostly using this module -

1 Like

I was able to reproduce the error. I thought it might be a bug, so I duplicated the space, but it worked without any corrections…:sweat_smile:

What is this… @hysts ?

Duplicated

Original

I tried Wan 2.2 5B - a Hugging Face Space by Wan-AI

Still getting the same error. So, maybe not particular to the above-stated space, but all similar spaces using ZeroGPU.

1 Like

Hmm, I’ not sure what the cause of the error is.

Looks like the Space was a modified version of Wan2.1 Fast - a Hugging Face Space by multimodalart , which is working fine.

The Space is not pinning dependency versions, so it might be caused by updates on dependencies.

I’ve asked internally about it, but it’s summer vacation season and many people are away, so we might not be able to get an answer right away.

1 Like

I have the exact same problem and duplicating the space didn’t work for me.
The problem is not with this space. So far, every zero gpu spaces I’ve tested has had the same problem.

1 Like

Oh. Even with hysts, it’s hard to figure out the cause right away…
If I can reproduce the bug in my own environment, I can try some trial and error…

I was able to fix the issue on my Space by adjusting the requirements.txt.

Try adding these two lines to your requirements.txt and restart the Space:

safetensors
sentencepiece

1 Like

Which Spaces have you tried?
I get the same error in the Spaces mentioned above, but other than that, I can’t reproduce it.
I’ve tried duplicating some ZeroGPU Spaces, but they worked fine.
For example, these Spaces work fine.

1 Like

I’m in the same situation as hysts.

I am encountering the same error with Flux Kontext 1.1 Dev

1 Like

OK, I’ve just restarted FLUX.1 Kontext - a Hugging Face Space by black-forest-labs and it’s back up now. It’s probably unrelated to the issue discussed in this thread.

1 Like

Looking at the code, I found a change in Gradio’s specifications.
I can’t believe it, but could Gradio’s repeated specification changes be causing code conflicts…?

Gradio 4

cache_examples="lazy",

Gradio 5

cache_examples=True,
cache_mode="lazy",

Gradio 5 TODAY

cache_examples="lazy",

Edit:
It was a misunderstanding. It seems that the previous notation was originally possible for compatibility reasons.

No, I don’t think so. The error in the log is related to CUDA. Also, if it’s about caching examples, the Space would probably not be able to launch.

2 Likes

BTW, I’ve restarted WAN 2.1 Fast & security - a Hugging Face Space by Heartsync and it’s back up too. So, my guess is that it’s due to some dependency updates.

1 Like

Thank you.
Well, maybe the combination of library versions at the time the space was launched was just bad.
Anyway, if it works fine after restarting, it’s probably not a big problem.:grinning_face:

The dependency issue still isn’t resolved since we don’t know which library caused the error. Restarting a Space usually doesn’t rebuild it, so restarting fixed the original Space. But when you duplicate it, it triggers a rebuild and the dependencies that aren’t pinned get updated to the latest version. That’s why the Wan 2.2 Space still isn’t working.

1 Like

Looking at the code, I found a change in Gradio’s specifications.
I can’t believe it, but could Gradio’s repeated specification changes be causing code conflicts…?

Forgot to mention, but there’s a misunderstanding in this comment.

cache_examples still accepts “lazy” in gradio 5.x so that gradio 4.x Spaces that specify cache_examples=”lazy” won’t break when upgrading to gradio 5.x. It shows a warning that it will stop working in the future. You mentioned that we’ve changed it repeatedly, but that’s not the case.

1 Like

Oh. In that space,

 --> RUN wget --progress=dot:giga https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda_12.9.0_575.51.03_linux.run -O cuda-install.run 	&& fakeroot sh cuda-install.run --silent --toolkit --override 	&& rm cuda-install.run

...

 --> RUN pip install --no-cache-dir pip -U && 	pip install --no-cache-dir 	datasets 	"huggingface-hub>=0.19" "hf_xet>=1.0.0,<2.0.0" "hf-transfer>=0.1.4" "protobuf<4" "click<8.1" "pydantic~=1.0" torch==2.8.0

So I modified requirements.txt:

#https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.8/flash_attn-2.7.4.post1+cu126torch2.7-cp310-cp310-linux_x86_64.whl
https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.6.3+cu129torch2.8-cp310-cp310-linux_x86_64.whl
numpy>=1.23.5,<2
einops

Then it worked for now.

If PyTorch is fixed at 2.8 due to CUDA Toolkit 12.9, some programs may not work…

Oh, good catch! Not sure if the change to accept up to torch 2.8.0 is intentional since the documentation hasn’t been updated. I’ll ask internally.

1 Like

Hello, I’m experiencing the same “runtime error” on Pony Realism, and I don’t really understand how to resolve it. I’ve tried other applications like Image to Video, and after just a few videos, the problem also started to appear. I haven’t been able to generate any images for almost two days. This is starting to cause problems for my current work. Please help me. I need a simple solution that doesn’t require programming.

1 Like