Limitations of iterable datasets

Hi @adrienchaton ,

I had noticed something similar as well with spiking convergence when training with streamed data and an Iterable Dataset vs a non-streamed non-iterable local dataset.

It may be worth checking out whether using ShufflerIterDataPipe() to shuffle the batches in the Iterable data loader will help to resolve your issue.

For example something like this:

from torch.utils.data.datapipes.iter.combinatorics import ShufflerIterDataPipe

shuffled_batches = ShufflerIterDataPipe(your_torch_dataset)

train_dataloader = DataLoader(shuffle_batches, shuffle = True, batch_size = 8)

I have been working through it with the Hugging Face team and documenting my results in this thread: Streaming Dataset of Sequence Length 2048 - #7 by loubnabnl

Hope this will help.

Best.