Can you help me interepret the results of my hyperparameter sweep for fine-tuning BLIP2-2.7?

omgwenxx · October 22, 2024, 10:53am

Hi!

I finetuned BLIP2-2.7 to create image descriptions for the HM dataset using their images and article descriptions as captions.

I was curious about different parameters and therefore did a grid search to tune the hyperparameters, focusing on the learning rate, batch size, and LoRA layers. The learning rate was tested with values of 1e-5, 5e-5, 1e-4, 5e-4, while the effective batch size was varied across 16, 32. If the entire batch could not fit into GPU memory, I used gradient accumulation to adjust accordingly. For the LoRA layers, two configurations were explored: all-linear, where LoRA was applied to all linear layers, and QV, where only the query and value layers were adapted. This setup resulted in a total of 16 possible combinations.

I used Weights and Biases to track my experiments and got the following plots. I am sharing this because I would like to get a sanity check to see if I interpreted the results correctly.

There are obvious outliers in the training and if you are interested let me know and I can tell you the settings in detail. Otherwise, I generally get the impression that

most settings work well seeing that the training and validation loss decrease.
using QV layers does not impact the validation loss drastically but can improve running time by reducing the time for one epoch from 4h to 2h per epoch with 20847 batches.

Would be happy to hear some opinions, if you have questions let me know.

Topic		Replies	Views
Example for Fine Tuning CLIP or BLIP2 for VQA Beginners	18	9100	February 20, 2025
Any one have an idea on how large should the dataset to be to fine-tune BLIP2 model? Models	0	152	November 16, 2024
Hyperparameter Tuning with LoRA configuration and PEFT 🤗Transformers	2	199	February 27, 2025
Fine tuning using llm Qlora Beginners	0	901	March 20, 2024
Fine tuned BLIP model is somehow 10x slower during inference Beginners	1	1172	May 29, 2023

Can you help me interepret the results of my hyperparameter sweep for fine-tuning BLIP2-2.7?

Related topics