I am trying to train a model on four GPUs (AWS ml.p3.8xlarge). As far as I can tell, to get my model to train in
DistributedDataParallel, I only need to specify some integer value for
local_rank. But my understanding is that this will only distribute the training across a single GPU (whichever I specify with
What is the proper way to launch
DistributedDataParallel training across all four GPUs using a
Trainer? Do I have to launch something via the command line (as hinted at here https://github.com/huggingface/transformers/issues/1651)?