Accelerate doesn't seem to use my GPU?

mihaien · April 3, 2024, 9:10am

Hi,

I’m trying to run the following example script diffusers/examples/unconditional_image_generation/train_unconditional.py at main · huggingface/diffusers · GitHub for training stable diffusion model on my dataset. I followed all the steps like in the tutorial and I configured Accelerate as following:

In which compute environment are you running?
Please select a choice using the arrow or number keys, and selecting with enter

This machine
Which type of machine are you using?
Please select a choice using the arrow or number keys, and selecting with enter
No distributed training
Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:[all]
Do you wish to use FP16 or BF16 (mixed precision)?
Please select a choice using the arrow or number keys, and selecting with enter
no

When running accelerate env in the terminal:
Copy-and-paste the text below in your GitHub issue

Accelerate version: 0.28.0
Platform: Windows-10-10.0.19045-SP0
Python version: 3.10.9
Numpy version: 1.26.3
PyTorch version (GPU?): 2.2.2+cu118 (True)
PyTorch XPU available: False
PyTorch NPU available: False
System RAM: 13.86 GB
GPU type: NVIDIA GeForce GTX 1650 (ignore my poor GPU XD, i’m a student and a begginer in ML)
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: [all]
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env:

But when I launch the script using the command in the tutorial, I see that Accelerate is not using my GPU, but the CPU:
accelerate launch train_unconditional.py --dataset_name=“mihaien/my-dataset” --resolution=64 --center_crop
–random_flip --output_dir=“ddpm-metaphors-64” --train_batch_size=16 --num_epochs=50 --gradient_accumulation_steps=1 --use_ema --learning_rate=1e-4 --lr_warmup_steps=500 --mixed_p
recision=no --push_to_hub
04/03/2024 12:05:07 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu

Mixed precision type: no

I also tried a solution that I found, but it doesn’t seem to work for me: Accelerate on single GPU doesnt seem to work - #2 by xtcgoat

PS: I’m using PyCharm.

Could someone please help me? Thank you so much!

swtb · April 4, 2024, 10:00am

Do you have torch built for GPU?
Do you have the right CUDA and CUDAnn libraries installed?
Below is for the GPU build of torch using CUDA 12.1

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

GPU type: NVIDIA GeForce GTX 1650 (ignore my poor GPU XD, i’m a student and a begginer in ML)

By the way, Google Colab is free to use and comes with a 16GB GPU. Alternatively, as you mentioned you are a student, you may find that your institution or even a local institution will allow you access to their High Performance Computing environment. Most Universities and Colleges have some form of GPU Cluster.

mihaien · April 4, 2024, 12:11pm

Yes, I installed CUDA and CUDAnn libraries for 11.8 version (as suggested in the post I have put in my post). I also ran the command from the pytorch site: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
I have some resources at my university, but I firstly I have to see if everything works as expected on my local computer and that I don’t have any errors or problems because the resources that I have access to are limited.
I tried to print to the console if cuda is available, so when I simply run the script from the pycharm torch.cuda.is_avaiable() prints true, but if I launch the script with accelerate, the torch.cuda.is_avaiable() will print false.
Should I upgrade the version of CUDA to 12.1?

swtb · April 4, 2024, 3:27pm

I would think accelerate would give a more descriptive error if the versioning was the problem. Try launching like this:

CUDA_VISIBLE_DEVICES=0 accelerate launch train_script.py

(With your config too)

muellerzr · April 4, 2024, 4:32pm

Might be the same as Cuda becomes unavailable and script is excuted by multiple times · Issue #2622 · huggingface/accelerate · GitHub? Are you using conda?

mihaien · April 4, 2024, 7:16pm

This worked for me!! Thank you so much!!

swtb · April 5, 2024, 12:44pm

Glad it worked.

If you manage to get set up on your universities compute environment you will likely end up using SLURM for job submission. At this point you can export the CUDA_VISIBLE_DEVICES=0 as an environment variable

for me that looks like:

export CUDA_VISIBLE_DEVICES="0"

python main.py

GangTu · September 18, 2024, 1:42pm

I solved this by adding the parameter ‘device=“cuda”’ to the pipeline function call.classifier = pipeline('sentiment-analysis', device="cuda")

Topic		Replies	Views
Accelerate on single GPU doesnt seem to work Beginners	2	5509	May 16, 2023
Accelerate on 1 GPU 🤗Accelerate	2	1875	April 8, 2022
How to use specified GPUs with Accelerator to train the model? Beginners	15	29433	August 23, 2024
Why my Accelerate just doesn't work? 🤗Accelerate	2	6241	March 7, 2022
Stable diffusion `train_text_to_image.py` only on one gpu 🧨 Diffusers	5	1191	May 2, 2023

Accelerate doesn't seem to use my GPU?

Related topics