Finding good batch size and learning rate for fine tuning

goerlitz · January 24, 2022, 9:16pm

Hey,
I’ve been playing around with fastai and huggingface transformers in the past months (mostly doing fine-tuning multi-class classification using German texts) but I have always wondered how to find a good batch size and learning rate without doing extensive hyperparameter search.

Well, for experimentation and learning, I’ve been running a lot of hyperparameter searches (>1000 trials) on the German gnad10 dataset on Colab to find the model with the best performance. But I can not do the same kind of extensive hyperparameter search on the larger dataset I have at work, mainly because it takes too long and it would cost a lot of money.

Hence, I’m struggling to find a good approach for identifying good hyperparameters for batch size and learning rate with max 10-20 trials runs when using a real world dataset and limited resources.

I would appreciate any suggestions for practical solutions and also references to related articles, like this one.

Here is what I have learned so far:

a good choice for batch size and learning rate has the most impact on model performance
batch size and learning rate depend on the dataset, the used architecture/model and the number of epochs
batch size and learning rate are dependent (in my experiments a bigger batch size needs a higher learning rate and vice versa)
the default batch size in huggingface’s TrainingArguments is 8. But this did not work well in my experiments. I had to increase it to 32 or 64, also using gradient accumulation as such batch sizes did not fit in the colab GPU memory.

I look forward hearing about your experiences.

Topic		Replies	Views
Trainer: How to find the best learning rate? Beginners	0	1166	February 23, 2023
Learning rate and Data size 🤗Transformers	1	630	April 2, 2022
Hyperparameter tuning practical guide? Beginners	1	498	October 6, 2021
Construct batch with token numbers Beginners	1	823	March 11, 2022
Learning rate, LR scheduler and optimiser choice for fine-tuning GPT2 Beginners	1	7285	September 3, 2020

Finding good batch size and learning rate for fine tuning

Related topics