RuntimeError: Found no NVIDIA driver on your system when running on NVIDIA A10G Large

safihaider · September 1, 2023, 6:50am

Hi Guys,

So I created a space with NVIDIA A10G hardware and blank docker template, pushed my Dockerfile with the script to fine-tune my private model and I encounter this error

/home/admin/.local/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/admin/.local/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
Is CUDA available: False
Traceback (most recent call last):
  File "/app/train.py", line 20, in <module>
    print(f"CUDA device: {torch.cuda.get_device_name(torch.cuda.current_device())}")
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/.local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 674, in current_device
    _lazy_init()
  File "/home/admin/.local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

I added some commands in my Dockerfile and checked the build logs and found that there is no NVIDIA GPU present on my space, attaching the logs

--> RUN lspci -vnn | egrep 'VGA|3D'
lspci: Unable to load libkmod resources: error -2
00:01.3 Non-VGA unclassified device [0000]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 08)
00:03.0 VGA compatible controller [0300]: Amazon.com, Inc. Device [1d0f:1111] (prog-if 00 [VGA controller])
DONE 0.0s

Does anybody else faced this issue?

radames · September 4, 2023, 5:23am

Hi @safihaider, could you please share an example of your Dockerfile?
Here’s an example of a Dockerfile that is compatible with our GPU hardware. In general, using a docker image like FROM nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04 is greatly advised.

safihaider · September 4, 2023, 8:48am

Hi @radames ,

Yes I tried different GPU templates from nvidia/cuda including the one you mentioned and the error still exists. I checked the CUDA version and its available, this is the content of my Dockerfile:

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
FROM nvcr.io/nvidia/pytorch:22.08-py3
RUN nvcc -V
RUN nvidia-smi

These are the build logs on my space:

===== Build Queued at 2023-09-03 08:19:07 / Commit SHA: 298bbcd

===== --> FROM nvcr.io/nvidia/pytorch:22.08-py3@sha256:1aa83e1a13f756f31dabf82bc5a3c4f30ba423847cb230ce8c515f3add88b262 

DONE 0.0s 

DONE 26.3s 

DONE 27.2s 

DONE 57.8s 

DONE 59.3s 

DONE 74.1s 

DONE 75.0s 

DONE 75.1s 

--> RUN nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0

DONE 0.2s

--> RUN nvidia-smi
/bin/bash: nvidia-smi: command not found

--> ERROR: process "/bin/sh -c nvidia-smi" did not complete successfully: exit code: 127

As you can see CUDA is present but the nvidia-smi command could not be found.

radames · September 8, 2023, 10:31pm

hi @safihaider , I don’t think you can run nvidia-smi at Docker build time, since the machine building docker image doesn’t have GPU capacity. Once the image is deployed in the hardware with GPU then you can run these commands.

github.com/NVIDIA/nvidia-docker

Use nvidia-smi in Dockerfile

opened 11:15AM - 20 Oct 16 UTC

closed 04:52PM - 04 Nov 16 UTC

Josca

work as intended

Hello, I would like to call nvidia-smi in Dockerfile, but docker building fail…s. My Dockerfile: FROM nvidia/cuda:7.5-cudnn5-devel RUN nvidia-smi CMD /bin/bash I am using building command: nvidia-docker build -t gpu ., but error message is displayed: /bin/sh: 1: nvidia-smi: not found When I build another docker image based on nvidia/cuda:7.5-cudnn5-devel and run container using such image, command nvidia-smi works. It seems nvidia GPU and its libraries are not available during docker image building. Could you help me?

Topic		Replies	Views
RuntimeError: Found no NVIDIA driver on your system Spaces	3	1221	October 11, 2022
RuntimeError Found no NVIDIA driver on your system Spaces	2	6130	October 6, 2022
How to deal with no GPU during docker build time Spaces	4	10556	October 23, 2023
Need Example Docker File with GPU Support Spaces	4	2129	October 12, 2023
Can't Access ZeroGPU (NVIDIA A100) even though it is enabled Beginners	1	131	March 5, 2025

RuntimeError: Found no NVIDIA driver on your system when running on NVIDIA A10G Large

Related topics