Batch size vs gradient accumulation

xshubhamx · April 17, 2024, 6:10am

Isnt it the opposite? Using batched input results in higher memory usage and not gradient accumulation. If gradient accumulation is giving an OOM (Out Of Memory Error), it is guranteed that the first one will also give the same error

Topic		Replies	Views
Selecting batch_size and gradient_accumulation_steps when fine-tuning Models	1	2279	December 31, 2023
GPT-2 Training Speed Unchanged with Different Batch Size & Grad Accumulation Beginners	1	28	June 28, 2025
Switch batch size and gradient accumulation step values mid training Beginners	0	244	February 28, 2024
What is the limit of grad accumulation? Intermediate	2	2940	May 4, 2021
How is it possible to get GPU memory errors when increasing the gradient_accumulation steps? Intermediate	1	1392	January 22, 2024

Batch size vs gradient accumulation

Related topics