Hello!
As I can see, now Trainer can runs multi GPU training even without using torchrun / python -m torch.distributed.launch / accelerate
(Just by running the training script like a regular python script: python my_script.py)
Can you tell me what algorithm it uses? DP or DDP?
And will the fsdp argument (from TrainingArguments) work correctly in this case?
It uses DP if you launch the script with python and DDP if you launch it with torchrun
. FSDP will be ignored if you don’t launch it with torchrun
.
1 Like