I have multiple GPUs available in my enviroment, but I am just trying to train on one GPU.
It looks like the default fault setting local_rank=-1 will turn off distributed training
However, I’m a bit confused on their latest version of the code
If local_rank =-1 , then I imagine that n_gpu would be one, but its being set to torch.cuda.device_count() . But then the device is being set to cuda:0
And if local_rank is anything else, n_gpu is being set to one. I was thinking may be the meaning of local_rank has changed, but looking at the main training code, it doesn’t look like it