Running a Trainer in DistributedDataParallel mode

I am trying to train a model on four GPUs (AWS ml.p3.8xlarge). As far as I can tell, to get my model to train in DistributedDataParallel, I only need to specify some integer value for local_rank. But my understanding is that this will only distribute the training across a single GPU (whichever I specify with local_rank).

What is the proper way to launch DistributedDataParallel training across all four GPUs using a Trainer? Do I have to launch something via the command line (as hinted at here https://github.com/huggingface/transformers/issues/1651)?

Hi @deppen8
Yes, you’ll need to use torch.distributed.launch for distributed training.
See this command for an example.