What algorithm Trainer uses for multi GPU training (without torchrun)

HoNTMi · January 19, 2023, 11:20am

Hello!
As I can see, now Trainer can runs multi GPU training even without using torchrun / python -m torch.distributed.launch / accelerate
(Just by running the training script like a regular python script: python my_script.py)
Can you tell me what algorithm it uses? DP or DDP?
And will the fsdp argument (from TrainingArguments) work correctly in this case?

Topic		Replies	Views
Trainer is not using multiple GPUs in the DP setup Beginners	0	826	April 9, 2023
Which data parallel does trainer use? DP or DDP? 🤗Transformers	6	6424	August 30, 2025
How to run single-node, multi-GPU training with HF Trainer? 🤗Transformers	5	15301	October 16, 2024
How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)? Intermediate	17	18038	September 6, 2023
Trainer API for Model Parallelism on Multiple GPUs 🤗Transformers	5	4234	September 10, 2024

What algorithm Trainer uses for multi GPU training (without torchrun)

Related topics