I am using a Huggingface implementation of IterableDataset with the set_epoch
method with the standard Trainer class. However, during training the _epoch
attribute of the dataset is never changed.(https://github.com/huggingface/datasets/blob/0cc77d7f45c73698c31eab4f8cfff901044d0020/src/datasets/iterable_dataset.py#L1829)
In the Trainer docs, it says for an IterableDataset to “have a set_epoch()
method that internally sets the seed of the RNGs used”. Im not sure how to use this if Trainer doesn’t internally call this at every epoch.
Should there be another option in the IterableDatasetShard to resolve this? https://github.com/huggingface/transformers/blob/63864e057fd4ecbf54c77599702873f7be871e65/src/transformers/trainer_pt_utils.py#L809