Accelerate doesn't seem to use my GPU?

Hi,

I鈥檓 trying to run the following example script diffusers/examples/unconditional_image_generation/train_unconditional.py at main 路 huggingface/diffusers 路 GitHub for training stable diffusion model on my dataset. I followed all the steps like in the tutorial and I configured Accelerate as following:

In which compute environment are you running?
Please select a choice using the arrow or number keys, and selecting with enter

  • This machine
    Which type of machine are you using?
    Please select a choice using the arrow or number keys, and selecting with enter
  • No distributed training
    Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
    Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
    Do you want to use DeepSpeed? [yes/NO]: NO
    What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:[all]
    Do you wish to use FP16 or BF16 (mixed precision)?
    Please select a choice using the arrow or number keys, and selecting with enter
  • no

When running accelerate env in the terminal:
Copy-and-paste the text below in your GitHub issue

  • Accelerate version: 0.28.0
  • Platform: Windows-10-10.0.19045-SP0
  • Python version: 3.10.9
  • Numpy version: 1.26.3
  • PyTorch version (GPU?): 2.2.2+cu118 (True)
  • PyTorch XPU available: False
  • PyTorch NPU available: False
  • System RAM: 13.86 GB
  • GPU type: NVIDIA GeForce GTX 1650 (ignore my poor GPU XD, i鈥檓 a student and a begginer in ML)
  • Accelerate default config:
    - compute_environment: LOCAL_MACHINE
    - distributed_type: NO
    - mixed_precision: no
    - use_cpu: False
    - debug: False
    - num_processes: 1
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: [all]
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env:

But when I launch the script using the command in the tutorial, I see that Accelerate is not using my GPU, but the CPU:
accelerate launch train_unconditional.py --dataset_name=鈥渕ihaien/my-dataset鈥 --resolution=64 --center_crop
鈥搑andom_flip --output_dir=鈥渄dpm-metaphors-64鈥 --train_batch_size=16 --num_epochs=50 --gradient_accumulation_steps=1 --use_ema --learning_rate=1e-4 --lr_warmup_steps=500 --mixed_p
recision=no --push_to_hub
04/03/2024 12:05:07 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu

Mixed precision type: no

I also tried a solution that I found, but it doesn鈥檛 seem to work for me: Accelerate on single GPU doesnt seem to work - #2 by xtcgoat

PS: I鈥檓 using PyCharm.

Could someone please help me? Thank you so much!

Do you have torch built for GPU?
Do you have the right CUDA and CUDAnn libraries installed?
Below is for the GPU build of torch using CUDA 12.1

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  • GPU type: NVIDIA GeForce GTX 1650 (ignore my poor GPU XD, i鈥檓 a student and a begginer in ML)

By the way, Google Colab is free to use and comes with a 16GB GPU. Alternatively, as you mentioned you are a student, you may find that your institution or even a local institution will allow you access to their High Performance Computing environment. Most Universities and Colleges have some form of GPU Cluster.

Yes, I installed CUDA and CUDAnn libraries for 11.8 version (as suggested in the post I have put in my post). I also ran the command from the pytorch site: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
I have some resources at my university, but I firstly I have to see if everything works as expected on my local computer and that I don鈥檛 have any errors or problems because the resources that I have access to are limited.
I tried to print to the console if cuda is available, so when I simply run the script from the pycharm torch.cuda.is_avaiable() prints true, but if I launch the script with accelerate, the torch.cuda.is_avaiable() will print false.
Should I upgrade the version of CUDA to 12.1?

I would think accelerate would give a more descriptive error if the versioning was the problem. Try launching like this:

CUDA_VISIBLE_DEVICES=0 accelerate launch train_script.py

(With your config too)

1 Like

Might be the same as Cuda becomes unavailable and script is excuted by multiple times 路 Issue #2622 路 huggingface/accelerate 路 GitHub? Are you using conda?

This worked for me!! Thank you so much!!

Glad it worked.

If you manage to get set up on your universities compute environment you will likely end up using SLURM for job submission. At this point you can export the CUDA_VISIBLE_DEVICES=0 as an environment variable

for me that looks like:

export CUDA_VISIBLE_DEVICES="0"

python main.py
1 Like