How does DDP + huggingface Trainer handle input data?

yapeichang · December 23, 2022, 9:20pm

I’m launching my training script with python -m torch.distributed.launch --nproc_per_node=6. The script was adapted from transformers/run_clm.py at main · huggingface/transformers · GitHub. During training, it’s important for me to make sure that all data remain in the same order, and the data scattering process will also be in order. Is that currently the case with how the huggingface Trainer automatically handles the torch.distributed.launch flag?

sgugger · December 26, 2022, 7:19am

It depends what do you mean by “the data needs to remain in the same order”. The dataset will be the same on each process (so in the same order) and in the DataLoader, the samples will be shuffled the same way. Then each process will see 1/6th of the samples.

yapeichang · December 27, 2022, 8:06pm

Is there a way to keep data instances in the same order, but after batching them, shuffle the batches?

zchys · May 18, 2023, 10:04am

It will randomly permute in training

Topic		Replies	Views
How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)? Intermediate	17	17779	September 6, 2023
Using Transformers with DistributedDataParallel — any examples? Intermediate	11	23149	May 8, 2023
Hugging Face and Distributed Training: DDP/DP Implementation Help Needed Intermediate	0	509	February 14, 2024
Which data parallel does trainer use? DP or DDP? 🤗Transformers	2	6345	August 17, 2022
Torchrun, trainer, dataset setup Intermediate	4	814	December 20, 2024

How does DDP + huggingface Trainer handle input data?

Related topics