How to calculate effective batch size while using DDP?

jaydeepb · July 21, 2025, 8:27pm

Hi there! I’m doing DDP style training with the help of trainer. I set up my accelerate config to use 4 GPUs. I have set these args for training:

per_device_train_batch_size=2, gradient_accumulation_steps=16

I want to know what will be the effective batch size in this case?

When I start the training it shows total 24024 steps. There are total 1025024 training examples. My intuition was that per_device_train_batch_size * gradient_accumulation_steps * 4 would be the effective batch size which comes out to be 128. But on dividing 1025024 by 128 I don’t get 24024. Is there anything that I’m missing?

Pimpcat-AU · July 21, 2025, 8:34pm

Seq length

tokens_processed / (seq_len * effective_batch_size)

jaydeepb · July 21, 2025, 11:02pm

resolved! was miscalculating!

Topic		Replies	Views
Per_device_train_batch_size in model parallelism Beginners	2	43	April 7, 2025
What is my batch size..? 🤗Accelerate	2	2448	April 29, 2024
How to calculate the effective batch size on TPU? Beginners	2	2183	September 1, 2021
GPT-2 Training Speed Unchanged with Different Batch Size & Grad Accumulation Beginners	1	26	June 28, 2025
Incorrect total train batch size when using tp_size > 1 and deepspeed DeepSpeed	1	79	May 20, 2025

How to calculate effective batch size while using DDP?

Related topics