Recently, this Space is not working properly. I’ve tried the factory reboot, but it doesn’t seem to work.
The log shows the following error:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/routes.py", line 247, in run_predict
output = await app.blocks.process_api(
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/blocks.py", line 641, in process_api
predictions, duration = await self.call_function(fn_index, processed_input)
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/blocks.py", line 556, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/user/app/model.py", line 1241, in run_with_translation
frames = self.run(text, seed, only_first_stage,image_prompt)
File "/home/user/app/model.py", line 1178, in run
set_random_seed(seed)
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/SwissArmyTransformer/arguments.py", line 429, in set_random_seed
torch.manual_seed(seed)
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/random.py", line 40, in manual_seed
torch.cuda.manual_seed_all(seed)
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/random.py", line 113, in manual_seed_all
_lazy_call(cb, seed_all=True)
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/__init__.py", line 156, in _lazy_call
callable()
File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/random.py", line 111, in cb
default_generator.manual_seed(seed)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
We haven’t changed the code for two weeks and the Space was working fine until a few days ago, though we needed to reboot the Space due to CUDA OOM from time to time (See this discussion). Also, it works fine in my GCP environment if I clone and run the Space.
How can I fix this?