Is there a standard way to handle leftover batches when using gradient accumulation?

sgugger · November 22, 2021, 12:30pm

You should use a step counter that goes over all the training loop instead of the counter step, so that you will finish your batch of epoch 0 during epoch 1 (unless your dataset is pretty small, the probablity of having the same samples twice is not super high).

Topic		Replies	Views
What is the limit of grad accumulation? Intermediate	2	2948	May 4, 2021
Gradient accumulation: should I duplicate data? 🤗Transformers	7	1019	February 1, 2021
Gradient accumulation averages over gradient 🤗Transformers	2	2069	November 12, 2020
Batch size, gradient accumulation steps for Linear schedule Models	0	720	May 1, 2021
Questions about steps with gradient accumulation Beginners	1	1030	July 19, 2023

Is there a standard way to handle leftover batches when using gradient accumulation?

Related topics