How to restrict training to one GPU if multiple are available, co

I have multiple GPUs available in my enviroment, but I am just trying to train on one GPU.

It looks like the default fault setting local_rank=-1 will turn off distributed training

However, I’m a bit confused on their latest version of the code

If local_rank =-1 , then I imagine that n_gpu would be one, but its being set to torch.cuda.device_count() . But then the device is being set to cuda:0
And if local_rank is anything else, n_gpu is being set to one. I was thinking may be the meaning of local_rank has changed, but looking at the main training code, it doesn’t look like it

You can use the CUDA_VISIBLE_DEVICES directive to indicate which GPUs should be visible to the command that you’ll use. For instance

# Only make GPUs #0 and #1 visible to the python script
CUDA_VISIBLE_DEVICES=0,1 python train.py <args>
# Only make GPU #3 visible to the script
CUDA_VISIBLE_DEVICES=3 python train.py <args>
1 Like

Do you have any suggestions for the case when setting CUDA_VISIBLE_DEVICES is not an option?

UPD: This worked in my case trainer.args._n_gpu = 1, but it seems wrong to reassign a property, especially a _-prepended.