Hello!
I am training an HF model with torch DDP using the following command line:
python -m torch.distributed.launch --nproc_per_node 2 my_script.py --{arguments}
I noticed that while training was using the two available GPUs, the evaluation step was done only on a single GPU. After checking the source code, it seems that > here the model is not wrapped inside the DDP when training==False
.
Is it expected that only one GPU will be used during the evaluation step? If yes, could you explain why the DDP cannot be used for the evaluation as well?