What algorithm Trainer uses for multi GPU training (without torchrun)

HoNTMi · January 19, 2023, 11:20am

Hello!
As I can see, now Trainer can runs multi GPU training even without using torchrun / python -m torch.distributed.launch / accelerate
(Just by running the training script like a regular python script: python my_script.py)
Can you tell me what algorithm it uses? DP or DDP?
And will the fsdp argument (from TrainingArguments) work correctly in this case?

sgugger · January 19, 2023, 4:25pm

It uses DP if you launch the script with python and DDP if you launch it with torchrun. FSDP will be ignored if you don’t launch it with torchrun.

Topic		Replies	Views
Trainer is not using multiple GPUs in the DP setup Beginners	0	831	April 9, 2023
Running a Trainer in DistributedDataParallel mode 🤗Transformers	1	1457	October 24, 2020
Multi gpu training 🤗Transformers	3	6038	April 24, 2022
How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)? Intermediate	17	18117	September 6, 2023
How to use FSDP + DPP in Trainer 🤗Transformers	1	1025	April 24, 2023

What algorithm Trainer uses for multi GPU training (without torchrun)

Related topics