Trainer with adaptive batch size?

mmalandro · September 29, 2023, 6:42am

I’m training a custom T5 model from scratch. My training examples have input and label lengths (in terms of token count) ranging from 200 to 2000. The average example has an input and output length of 300. I am using the Huggingface Trainer to do this.

My GPU only has enough memory for a batch size of 1 for examples where the input and label length are at their maximum. Therefore, my Trainer instance uses per_device_train_batch_size = 1 and gradient_accumulation_steps = 128. However, this is quite wasteful for batches containing only short examples. For batches where the inputs and labels are bounded in length by 500, for instance, my GPU could handle per_device_train_batch_size = 16 and gradient_accumulation_steps = 8 to achieve the same effective batch size. If I could somehow vary the values of these parameters (keeping the effective batch size constant) according to input and label length as Trainer iterates through my training data, I could improve the throughput and lessen the training time considerably.

Any thoughts on how I might achieve this? (Would it be easy to modify trainer.py to achieve this? If so, where in the file? Or is there a straightforward approach I’m not seeing?)

Topic		Replies	Views
Where to set the Evaluation Batch Size in Trainer Beginners	2	8626	June 17, 2022
Batch size in trainer eval loop DeepSpeed	3	4551	April 22, 2022
GPT-2 Training Speed Unchanged with Different Batch Size & Grad Accumulation Beginners	1	11	June 28, 2025
Output effective batch size and GPU memory usage in logs when using auto_find_batch_size 🤗Transformers	1	966	March 13, 2023
Batch size during training vs batch size during evaluation Beginners	1	1881	August 27, 2023

Trainer with adaptive batch size?

Related topics