Hi Adrian,
Thank you for your response. Here is the output of nvidia-smi
command
(fixedprompt) (base) exouser@possibly-right-crawdad:~/Squeezed-Attention/LongBench$ nvidia-smi
Thu Mar 20 16:58:13 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB Off | 00000000:04:00.0 Off | 0 |
| N/A 25C P0 52W / 400W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
It does seem like I am running this on a GPU node. nvcc --version
returns
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
If I understand correctly, there is no incompatibility between cuda version and driver version. However, below command indicates that the cuda is not available even though nvidia-smi
shows that there is a node available?
(fixedprompt) (base) exouser@possibly-right-crawdad:~/Squeezed-Attention/LongBench$ python
Python 3.9.21 | packaged by conda-forge | (main, Dec 5 2024, 13:51:40)
[GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/home/exouser/miniconda3/envs/fixedprompt/lib/python3.9/site-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
False