`Trainer` seems to drop last incomplete batch even if `Dataloader` is set with drop_last=False

AndreaSottana · September 30, 2022, 11:06am

Hello,
I am using the transformer’s Seq2SeqTrainer class to train a seq2seq model.
I am using the following parameters in the Seq2SeqTrainingArguments class, amongst others:

gradient accumulation steps: 8
batch size: 2
dataloader_drop_last: False

I am also using 2 GPUs for the training.
Therefore the total effective batch size is 8 * 2 * 2 = 32.
I have noticed the following: if I train the model with a dataset comprising 32 samples, the model performs 1 total optimization step per epoch (as expected), if I train with a dataset comprising 64 samples, the model performs 2 total optimization steps (as expected); however, if I train with a dataset comprising 60 samples (or any number between 33 and 63), the model performs 1 total optimization step, whereas I would expect there to be 2 optimization steps, since one optimization step can only process 32 samples and I set drop_last=False.

See extract console printout below

***** Running training *****
  Num examples = 60
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 8
  Total optimization steps = 1

Can anyone shine light as to why this is happening? Is this expected, or am I doing anything wrong?

Happy to provide a short reproducible code example if required.

JianhuiWei · February 3, 2024, 10:21am

Hello, Is this bug solved?

RylanSchaeffer · July 7, 2024, 10:31pm

Is this a confirmed bug?

RylanSchaeffer · July 7, 2024, 10:31pm

I would love a short reproducible example, if you could?

mst272 · July 27, 2024, 10:06am

So now does that question have an answer? I’m troubled by this question too, thanks.

Topic		Replies	Views
Seq2SeqTrainer: enabled must be a bool (got NoneType) 🤗Transformers	15	3957	December 5, 2022
Batch size for trainer.predict() 🤗Transformers	4	6878	November 26, 2022
Expected input batch_size.. to match target batch_size 🤗Transformers	0	513	July 1, 2022
Problem fine-tuning a model with Seq2Seq Trainer Beginners	1	994	June 25, 2023
Seq2SeqTrainer multiple GPUs 🤗Transformers	2	100	January 22, 2025

`Trainer` seems to drop last incomplete batch even if `Dataloader` is set with drop_last=False

Related topics