What is my batch size..?

cgr71ii · April 29, 2024, 2:45pm

[disclaimer: sorry about the “tricky” links (I added some spaces), but “new users can’t post more than 2 links” )]

Hi!

I was with the same question, so I started to look for some information. If you check the quick tour page documentation before v0.28.0 (e.g., https :// huggingface . co/docs/accelerate/v0.27.0/en/quicktour) you will see a warning about the batch size:

The actual batch size for your training will be the number of devices used multiplied by the batch size you set in your script. For instance, training on 4 GPUs with a batch size of 16 set when creating the training dataloader will train at an actual batch size of 64 (4 * 16). If you want the batch size remain the same regardless of how many GPUs the script is run on, you can use the option split_batches=True when creating and initializing Accelerator. Your training dataloader may change length when going through this method: if you run on X GPUs, it will have its length divided by X (since your actual batch size will be multiplied by X), unless you set split_batches=True.

And if you check the same page but for versions before v0.24.0 you would only see:

The actual batch size for your training will be the number of devices used multiplied by the batch size you set in your script: for instance training on 4 GPUs with a batch size of 16 set when creating the training dataloader will train at an actual batch size of 64.

So, before v0.28.0 it seems that you had to take this into account in order to calculate your actual batch size, unless you were using with accelerator.accumulate(). In recent versions I think you still have to do this operation (https://huggingface.co/docs/accelerate/v0.29.3/en/concept_guides/performance#observed-batch-sizes). However, you can see Performing gradient accumulation with 🤗 Accelerate in order to know which case applies to your code if you also are using gradient accumulation. If you’re not using with accelerator.accumulate() I think your actual batch size is 3 because you’re not using if (index+1) % gradient_accumulation_steps == 0: as it is used in the mentioned page, at least with the code you provided. I’d like if someone could say if I’m wrong to clarify the current status of HF accelerate since the documentation was updated in v0.28.0 with respect the actual batch size. As far as I’ve understand the current documentation (example: assuming 8 processes, 1 GPU each, and batch_size=64):

If you’re using Accelerator(gradient_accumulation_steps=1); with accelerator.accumulate():, then actual batch_size=64 * 8
If you’re using Accelerator(gradient_accumulation_steps=2); with accelerator.accumulate():, then actual batch_size=64 * 8 * 2
If you’re using Accelerator(gradient_accumulation_steps=1), then actual batch_size=64
If you’re using Accelerator(gradient_accumulation_steps=1); if (index+1) % gradient_accumulation_steps == 0: update_optimizer(), then actual batch_size=64 * 8

I’ve seen that @muellerzr and @marcsun13 are frequent posters. I’d appretiate if some of you could give some light regarding this “actual” batch size and indicate if the provided examples above are correct

Thanks in advance!

Topic		Replies	Views
Batch sizes / 2 GPUs + Windows 10 = 1 GPU? Beginners	6	3043	August 22, 2021
Clarifying multi-GPU memory usage Beginners	1	1388	November 5, 2020
Dataloader fetches slowly using accelerator for distributed training 🤗Accelerate	0	1174	October 29, 2021
Is it possible to see what batch size is being used in deepspeed training with auto batch size? 🤗Accelerate	1	501	July 14, 2023
Torch.distributed.launch question Beginners	2	3308	October 19, 2022

What is my batch size..?

Related Topics