Due to an upgrade of CUDA or bitsandbytes versions, my model deployed on Hugging Face was not running properly. The error message was as follows:
================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. CUDA driver not installed
2. CUDA not installed
3. You have multiple conflicting CUDA libraries
4. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
===========================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA SETUP: Setup Failed!
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
...
File "/home/user/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/functional.py", line 17, in <module>
from .cextension import COMPILED_WITH_CUDA, lib
File "/home/user/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 22, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with details about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
Initially, I thought the issue was due to py3.10, so I wanted to downgrade the Python version of the space to py3.8 but wasn’t sure how to do it. Thanks to @John6666 for pointing me to the official documentation, which clarified that I can set the Python version for the space by using the python_version
parameter.
After setting the Python version to 3.8 as per the documentation, the space still failed, which helped me realize that the issue wasn’t with the Python version. Thanks to @geophysuni for sharing his solution, which led me to suspect that the issue might be related to the versions of peft and bitsandbytes. I then searched for related issues online:
RuntimeError: CUDA Setup failed despite GPU being available. · Issue #1434 · bitsandbytes-foundation/bitsandbytes · GitHub
Unable to override PyTorch CUDA Version · Issue #1315 · bitsandbytes-foundation/bitsandbytes · GitHub
Although following their advice to set bitsandbytes>=0.43.2
and bitsandbytes==0.44.1
still resulted in errors, when I coincidentally set bitsandbytes==0.41.0
, the error stopped appearing. Note that after setting bitsandbytes==0.41.0
, I also needed to install the scipy package, and my model successfully deployed and ran on the space.
In summary, the CUDA version on Hugging Face is 12.3, which is not compatible with either the older bitsandbytes version (0.37.0) or the latest version (0.45.0). The correct version to use is 0.41.0.
Finally, I want to thank everyone for their help in identifying and resolving the issue.