Restoring the state of the DataLoader using skip_first_batches() after first epoch

Isayoften · October 11, 2024, 9:17am

We can use skip_first_batches() to restore the state of the dataloader (as mentioned here).

But what if we use shuffle=True and want to restore the dataloader’s position after several epochs, which would involve multiple reshufflings? How can we ensure the same data order after calling accelerator.load_state() and then skip the correct number of batches?

For example:

If we use shuffle=False in the dataloader, the data is ordered the same way in every epoch:

1st epoch: 1 2 3 4 5
2nd epoch: 1 2 3 4 5
So, in this case, we don’t need to worry about shuffling and can simply remember how many batches were processed in the current epoch.

However, if we use shuffle=True, the order of the data changes with each epoch:

1st epoch: 4 2 3 1 5
2nd epoch: 5 1 2 3 4
Suppose we stopped halfway through the second epoch. How can we restore this exact state using accelerator.load_state() and accelerator.skip_first_batches()?

Topic		Replies	Views
Resuming run: resume dataloader at specific index 🤗Accelerate	1	687	August 12, 2022
DataLoader from accelerator samples from beginning of dataset for last batch 🤗Accelerate	1	663	January 15, 2024
Good way to reshaffle/reacreate dataloader content? 🤗Accelerate	0	308	March 18, 2023
Behavior of shuffled parquet dataset 🤗Datasets	1	99	November 30, 2024
Caching and Shuffling Datasets on the Same Machine 🤗Datasets	1	393	July 21, 2023

Restoring the state of the DataLoader using skip_first_batches() after first epoch

Related topics