DataLoader from accelerator samples from beginning of dataset for last batch

adamits · July 6, 2023, 12:06am

After quite a bit of digging, I realized that when I call

dataloader, model = accelerator.prepare(
        dataloader, model
)

The even_batches hyperparameter defaults to True, and this is expected behavior in that case. That is, my last batch cannot be split evenly over the number of devices, so a few samples are taken from the start of the dataset. However:

When I change my accelerator with accelerator = Accelerator(even_batches=False) this does not do anything! I can confirm that even_batches is set to False, and I get no warning about this.

Digging a bit more, the dataloader returned by prepare is actually a DataLoaderDispatcher, which does not seem to care about the even_batches argument. I can see here that it simply rounds up the batch size, and here that it then samples from the first batch.

So, how can I get the accelerated dataset to stop doing this? Can the docs be updated to reflect the fact that setting even_batches to False does nothing (I suspect with certain buried dataset types, etc. it does something, not sure though)?

Johnrs · January 15, 2024, 5:52pm

It may be useful for you.

ddp_model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) 

with accelerator.join_uneven_inputs([ddp_model], even_batches=False):

        for input, output in dataloader:
                outputs = model(input)
                loss = loss_func(outputs)
                loss.backward()
                optimizer.step()
                optimizer.zero_grad()

Topic		Replies	Views
Accelerator .prepare() replaces custom DataLoader Sampler 🤗Accelerate	5	1291	March 9, 2025
Troubles with features in .prepare() 🤗Accelerate	1	35	November 30, 2024
Restoring the state of the DataLoader using skip_first_batches() after first epoch 🤗Accelerate	0	35	October 11, 2024
Use Set_epoch for accelerator? 🤗Accelerate	0	146	July 19, 2024
Accelerate - WeightedRandomSampler Dataloader Intermediate	1	254	June 18, 2024

DataLoader from accelerator samples from beginning of dataset for last batch

Related topics