From what I can see, there are only a couple changes for doing multi-gpu training:
if args.n_gpu > 1:
model = torch.nn.DataParallel(model)
if args.n_gpu > 1:
loss = loss.mean() # mean() to average on multi-gpu parallel (not distributed) training
Seems fairly straightforward unless I’m missing something else.