My machine has four GPU devices. When I’m running the training script, I’m using accelerate launch --num_processes 1 train.py
. However, I’m getting the device mismatch error saying that some tensors are on device-0 when they should be on 2 or something like that.
I would think that setting num_processes
to 1 should prevent this. Is this a design choice?
I can prevent this error by adding CUDA_VISIBLE_DEVICES=$DEVICE_ID
in front of the launch command but that’s a separate issue.
Thanks!