CUDA related error and factory reboot not working

hysts · September 5, 2022, 12:23am

Recently, this Space is not working properly. I’ve tried the factory reboot, but it doesn’t seem to work.

The log shows the following error:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/routes.py", line 247, in run_predict
    output = await app.blocks.process_api(
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/blocks.py", line 641, in process_api
    predictions, duration = await self.call_function(fn_index, processed_input)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/blocks.py", line 556, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/user/app/model.py", line 1241, in run_with_translation
    frames = self.run(text, seed, only_first_stage,image_prompt)
  File "/home/user/app/model.py", line 1178, in run
    set_random_seed(seed)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/SwissArmyTransformer/arguments.py", line 429, in set_random_seed
    torch.manual_seed(seed)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/random.py", line 40, in manual_seed
    torch.cuda.manual_seed_all(seed)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/random.py", line 113, in manual_seed_all
    _lazy_call(cb, seed_all=True)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/__init__.py", line 156, in _lazy_call
    callable()
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/random.py", line 111, in cb
    default_generator.manual_seed(seed)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

We haven’t changed the code for two weeks and the Space was working fine until a few days ago, though we needed to reboot the Space due to CUDA OOM from time to time (See this discussion). Also, it works fine in my GCP environment if I clone and run the Space.

How can I fix this?

chris-rannou · September 5, 2022, 8:56am

Currently investigating.

Edit: Fixed

Topic		Replies	Views
Unkown Type Error in Space 🔒 Gradio	6	2303	August 31, 2022
I got an error in my roop gradio app how to solve it 🔒 Gradio	0	1100	September 3, 2023
gr.Interface.load not working Spaces	3	2035	September 5, 2023
FastAPI 0.100.0 breaks gradio 🔒 Gradio	4	3070	August 11, 2023
How do I fix the "RuntimeError: CUDA error: CUDA driver version is insufficient for CUDA runtime version CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect." error? Spaces	12	88562	July 13, 2023

CUDA related error and factory reboot not working

Related topics