How to calculate effective batch size while using DDP?

Hi there! I’m doing DDP style training with the help of trainer. I set up my accelerate config to use 4 GPUs. I have set these args for training:

  • per_device_train_batch_size=2, gradient_accumulation_steps=16

I want to know what will be the effective batch size in this case?

When I start the training it shows total 24024 steps. There are total 1025024 training examples. My intuition was that per_device_train_batch_size * gradient_accumulation_steps * 4 would be the effective batch size which comes out to be 128. But on dividing 1025024 by 128 I don’t get 24024. Is there anything that I’m missing?

1 Like

Seq length

tokens_processed / (seq_len * effective_batch_size)

1 Like

resolved! was miscalculating!

2 Likes