I am trying to train a model on four GPUs (AWS ml.p3.8xlarge). As far as I can tell, to get my model to train in DistributedDataParallel
, I only need to specify some integer value for local_rank
. But my understanding is that this will only distribute the training across a single GPU (whichever I specify with local_rank
).
What is the proper way to launch DistributedDataParallel
training across all four GPUs using a Trainer
? Do I have to launch something via the command line (as hinted at here https://github.com/huggingface/transformers/issues/1651)?