Is IterableDataset automatically reshuffled after each epoch in Trainer?

I’m using a IterableDataset from datasets library and passing it to HF Trainer, something like this:

from datasets import load_dataset
from transformers import Trainer, TrainingArguments

ds = load_dataset("my-dataset", streaming=True)

training_args = TrainingArguments(
    output_dir="my_model",
    per_device_train_batch_size=8,
    max_steps=1000,  # Large enough for multiple epochs
)

trainer = Trainer(
    model=my_model,
    train_dataset=ds["train"].shuffle(seed=42),
    eval_dataset=ds["test"],
)
trainer.train()

Then, will trainer automatically reshuffle my iterable dataset after each epoch?
I cannot find the behavior in docs, but I found a code which seems to force Trainer to reshuffle using callbacks (so I guess the trainer will not reshuffle the dataset?):

1 Like