How can I get advantage using multi-GPUs

From what I can see, there are only a couple changes for doing multi-gpu training:

if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

if args.n_gpu > 1:
                loss = loss.mean()  # mean() to average on multi-gpu parallel (not distributed) training

Seems fairly straightforward unless I’m missing something else.