Hello!
As I can see, now Trainer can runs multi GPU training even without using torchrun / python -m torch.distributed.launch / accelerate
(Just by running the training script like a regular python script: python my_script.py)
Can you tell me what algorithm it uses? DP or DDP?
And will the fsdp argument (from TrainingArguments) work correctly in this case?