HF Accelerate uses multiple GPUs even when setting `num_processes` to 1

My machine has four GPU devices. When I’m running the training script, I’m using accelerate launch --num_processes 1 train.py. However, I’m getting the device mismatch error saying that some tensors are on device-0 when they should be on 2 or something like that.

I would think that setting num_processes to 1 should prevent this. Is this a design choice?

I can prevent this error by adding CUDA_VISIBLE_DEVICES=$DEVICE_ID in front of the launch command but that’s a separate issue.
