What is my batch size..?

[disclaimer: sorry about the “tricky” links (I added some spaces), but “new users can’t post more than 2 links” :confused: )]

Hi!

I was with the same question, so I started to look for some information. If you check the quick tour page documentation before v0.28.0 (e.g., https :// huggingface . co/docs/accelerate/v0.27.0/en/quicktour) you will see a warning about the batch size:

The actual batch size for your training will be the number of devices used multiplied by the batch size you set in your script. For instance, training on 4 GPUs with a batch size of 16 set when creating the training dataloader will train at an actual batch size of 64 (4 * 16). If you want the batch size remain the same regardless of how many GPUs the script is run on, you can use the option split_batches=True when creating and initializing Accelerator. Your training dataloader may change length when going through this method: if you run on X GPUs, it will have its length divided by X (since your actual batch size will be multiplied by X), unless you set split_batches=True.

And if you check the same page but for versions before v0.24.0 you would only see:

The actual batch size for your training will be the number of devices used multiplied by the batch size you set in your script: for instance training on 4 GPUs with a batch size of 16 set when creating the training dataloader will train at an actual batch size of 64.

So, before v0.28.0 it seems that you had to take this into account in order to calculate your actual batch size, unless you were using with accelerator.accumulate(). In recent versions I think you still have to do this operation (https://huggingface.co/docs/accelerate/v0.29.3/en/concept_guides/performance#observed-batch-sizes). However, you can see Performing gradient accumulation with 🤗 Accelerate in order to know which case applies to your code if you also are using gradient accumulation. If you’re not using with accelerator.accumulate() I think your actual batch size is 3 because you’re not using if (index+1) % gradient_accumulation_steps == 0: as it is used in the mentioned page, at least with the code you provided. I’d like if someone could say if I’m wrong to clarify the current status of HF accelerate since the documentation was updated in v0.28.0 with respect the actual batch size. As far as I’ve understand the current documentation (example: assuming 8 processes, 1 GPU each, and batch_size=64):

  • If you’re using Accelerator(gradient_accumulation_steps=1); with accelerator.accumulate():, then actual batch_size=64 * 8
  • If you’re using Accelerator(gradient_accumulation_steps=2); with accelerator.accumulate():, then actual batch_size=64 * 8 * 2
  • If you’re using Accelerator(gradient_accumulation_steps=1), then actual batch_size=64
  • If you’re using Accelerator(gradient_accumulation_steps=1); if (index+1) % gradient_accumulation_steps == 0: update_optimizer(), then actual batch_size=64 * 8

I’ve seen that @muellerzr and @marcsun13 are frequent posters. I’d appretiate if some of you could give some light regarding this “actual” batch size and indicate if the provided examples above are correct :slight_smile:

Thanks in advance!

1 Like