Batch size, gradient accumulation steps for Linear schedule

jordiclive · May 1, 2021, 3:35pm

If I am trying to recreate paper’s results, and they used a Batch size of 16 with 30 epochs, gradient accumulation steps of 3.

But I can only fit a batch size of 8 on my GPU.

I am using a Linear warm up scheduler with get_linear_schedule_with_warmup(
self.opt, num_warmup_steps=0, num_training_steps=(dataset_size / effective_batch_size) * self.hparams.max_epochs
)

How should I adjust my max epochs, and gradient accumulation steps if I am using half the batch size?

Should I train for the same number of epochs. But increase the gradient accumulation steps?

Topic		Replies	Views
Selecting batch_size and gradient_accumulation_steps when fine-tuning Models	1	2222	December 31, 2023
Batch size vs gradient accumulation Beginners	9	33695	November 28, 2024
Questions about steps with gradient accumulation Beginners	1	1027	July 19, 2023
GPT-2 Training Speed Unchanged with Different Batch Size & Grad Accumulation Beginners	1	11	June 28, 2025
Is there a standard way to handle leftover batches when using gradient accumulation? Intermediate	1	616	November 22, 2021

Batch size, gradient accumulation steps for Linear schedule

Related topics