HF Accelerate uses multiple GPUs even when setting `num_processes` to 1

seanswyi · August 2, 2024, 12:52am

My machine has four GPU devices. When I’m running the training script, I’m using accelerate launch --num_processes 1 train.py. However, I’m getting the device mismatch error saying that some tensors are on device-0 when they should be on 2 or something like that.

I would think that setting num_processes to 1 should prevent this. Is this a design choice?

I can prevent this error by adding CUDA_VISIBLE_DEVICES=$DEVICE_ID in front of the launch command but that’s a separate issue.

Thanks!

Topic		Replies	Views
Multiple GPUs are being used despite `--num_processes 1` 🤗Accelerate	0	93	July 31, 2024
`num_processes == 1` even when I set it to `--num_processes 2` 🤗Accelerate	5	3287	May 18, 2023
RuntimeError: Expected all tensors to be on the same device, but found at least two devices Beginners	0	94	November 30, 2024
Multi-GPU Training using Accelerate: RAM Issue Leading to Failure 🤗Accelerate	0	94	July 16, 2024
Multi-node training 🤗Accelerate	2	2969	January 16, 2023

HF Accelerate uses multiple GPUs even when setting `num_processes` to 1

Related topics