How does DDP + huggingface Trainer handle input data?

I’m launching my training script with python -m torch.distributed.launch --nproc_per_node=6. The script was adapted from transformers/run_clm.py at main · huggingface/transformers · GitHub. During training, it’s important for me to make sure that all data remain in the same order, and the data scattering process will also be in order. Is that currently the case with how the huggingface Trainer automatically handles the torch.distributed.launch flag?

It depends what do you mean by “the data needs to remain in the same order”. The dataset will be the same on each process (so in the same order) and in the DataLoader, the samples will be shuffled the same way. Then each process will see 1/6th of the samples.

1 Like

Is there a way to keep data instances in the same order, but after batching them, shuffle the batches?