Is it a bad idea to increase batch size during training?

williamberman · May 22, 2023, 6:15pm

This is an interesting question! I do not think increasing the batch size while using the optimizer states from the previous training run would be an issue. I think you could either go with the old optimizer states or just start with fresh gradient statistics both would work imo.

Just worst case if you’re limited on compute, you could start with a small learning rate for a bit and then increase once you’re confident you have better gradient statistics. I haven’t seen this as a formal recommendation anywhere but it’s more just like a worse case might as well

Topic		Replies	Views
Selecting batch_size and gradient_accumulation_steps when fine-tuning Models	1	2192	December 31, 2023
Switch batch size and gradient accumulation step values mid training Beginners	0	234	February 28, 2024
What is the limit of grad accumulation? Intermediate	2	2908	May 4, 2021
Batch size vs gradient accumulation Beginners	9	33110	November 28, 2024
Per_device_train_batch_size in model parallelism Beginners	2	33	April 7, 2025

Is it a bad idea to increase batch size during training?

Related topics