How to ensure the dataset is shuffled for each epoch using Trainer and Datasets?

The Seq2SeqTrainer (as well as the standard Trainer) uses a PyTorch Sampler to shuffle the dataset. At each epoch, it does shuffle the dataset and it also groups the samples of roughly the same length size. You can find the Sampler definition here.

4 Likes