AutoTrain Advanced: FATAL ERROR: NVIDIA Management Library (NVML) not found

rishabhchopra · April 9, 2024, 2:32pm

Hi, I’m following along this video, but instead of using a zip file, I’m using the chest x-ray dataset hosted on the hub from Julien Simon’s video.

I’m getting the following error in my logs when I click on Train:

FATAL ERROR: NVIDIA Management Library (NVML) not found.
HINT: The NVIDIA Management Library ships with the NVIDIA display driver (available at
      https://www.nvidia.com/Download/index.aspx), or can be downloaded as part of the
      NVIDIA CUDA Toolkit (available at https://developer.nvidia.com/cuda-downloads).
      The lists of OS platforms and NVIDIA-GPUs supported by the NVML library can be
      found in the NVML API Reference at https://docs.nvidia.com/deploy/nvml-api.

I’ve already added my HF_TOKEN, and tried:

Restarting the Space
Factory Reboot

Why is this error occurring? What can I do to fix this error?

For context, I’m creating this project to teach young students on how to leverage the no-code interface on Hugging Face and Teachable Machine Learning.

abhishek · April 9, 2024, 6:08pm

you need to specify a train split.

AleAle2423 · April 13, 2024, 12:29am

Im having a similar error, the training split is not fixing it. Its suggesting this "pip3 install --force-reinstall nvidia-ml-py
"

13Ankur · April 15, 2024, 11:01am

Facing same issue

abhishek · April 15, 2024, 11:12am

thats just a warning. ignore it.

13Ankur · April 15, 2024, 12:31pm

alueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
FATAL ERROR: NVIDIA Management Library (NVML) not found.
HINT: The NVIDIA Management Library ships with the NVIDIA display driver (available at
Official Drivers | NVIDIA), or can be downloaded as part of the
NVIDIA CUDA Toolkit (available at CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer).
The lists of OS platforms and NVIDIA-GPUs supported by the NVML library can be
found in the NVML API Reference at NVML API Reference Guide :: GPU Deployment and Management Documentation.
FATAL ERROR: NVIDIA Management Library (NVML) not found.
HINT: The NVIDIA Management Library ships with the NVIDIA display driver (available at
Official Drivers | NVIDIA), or can be downloaded as part of the
NVIDIA CUDA Toolkit (available at CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer).
The lists of OS platforms and NVIDIA-GPUs supported by the NVML library can be
found in the NVML API Reference at NVML API Reference Guide :: GPU Deployment and Management Documentation.
FATAL ERROR: NVIDIA Management Library (NVML) not found.
HINT: The NVIDIA Management Library ships with the NVIDIA display driver (available at
Official Drivers | NVIDIA), or can be downloaded as part of the
NVIDIA CUDA Toolkit (available at CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer).
The lists of OS platforms and NVIDIA-GPUs supported by the NVML library can be
found in the NVML API Reference at NVML API Reference Guide :: GPU Deployment and Management Documentation.

13Ankur · April 15, 2024, 12:31pm

facing the above issue when run auto train

13Ankur · April 15, 2024, 12:32pm

Adilmar · April 17, 2024, 1:28pm

Facing same issue

abhishek · April 17, 2024, 1:59pm

use gpu for large model

arunguvi · April 27, 2024, 3:07pm

I am facing the same issue when running on local Mac M2 pro

I also got this error when I tried autotrain app
Your installed package nvidia-ml-py is corrupted. Skip patch functions nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses. You may get incorrect or incomplete results. Please consider reinstall package nvidia-ml-py via pip3 install --force-reinstall nvidia-ml-py nvitop.
Your installed package nvidia-ml-py is corrupted. Skip patch functions nvmlDeviceGetMemoryInfo. You may get incorrect or incomplete results. Please consider reinstall package nvidia-ml-py via pip3 install --force-reinstall nvidia-ml-py nvitop.

abhishek · April 27, 2024, 3:18pm

this is not an error. you can ignore it and move on. on macbook, you need to disable quantization.

arunguvi · April 27, 2024, 4:45pm

ok removed as you mentioned and it ran as expected
But I can see a error mentioning No GPU support for bitsandbytes
i can see that it does not support mps yet (Support for Apple silicon · Issue #252 · bitsandbytes-foundation/bitsandbytes · GitHub)

I get these errors

/Users/temme/Documents/pogo/framework-validator/autotrain/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
‘NoneType’ object has no attribute ‘cadam32bit_grad_fp32’

my question is does it use CPU or GPU?

abhishek · April 27, 2024, 6:03pm

you need to set quantization to none. please follow the params here: How to Finetune phi-3 on MacBook Pro

arunguvi · April 28, 2024, 5:20am

thanks abhishek I can able to run it with TinyLlama

Topic		Replies	Views
Need Help Interpreting AutoTrain Error Log Beginners	0	386	January 22, 2024
Autotrain nvidia dgx cloud not working 🤗AutoTrain	0	35	July 17, 2024
Converting Nvidia models for Hugging Face Beginners	1	467	November 2, 2023
"No GPU found" - HuggingFace Spaces/Autotrain Advanced/local hardware Beginners	1	1981	September 6, 2024
Train a model with autotrain on huggingface using the API 🤗AutoTrain	4	301	May 30, 2024

AutoTrain Advanced: FATAL ERROR: NVIDIA Management Library (NVML) not found

Related topics