BLIP-2 - Should the image + language model be frozen by default?

tc-wolf · April 17, 2023, 4:44pm

Was reading through the BLIP-2 paper, and saw that the image model and language model are frozen by default.

In the Hugging Face implementation the vision and language models are initialized without freezing (unless I’m missing something in the implementation). I think by default these should be frozen, as this is the training approach used in the paper, and otherwise will be training much more than expected (and not doing the desired bottlenecking with q-former).

In the upstream implementation by SalesForce they freeze ViT and the language model used.

Topic		Replies	Views
How do I finetune Blip2 model on a custom dataset? Intermediate	1	606	October 1, 2024
Freeze encoder for some time and then unfreeze - does it improve the model? 🤗Transformers	1	813	March 24, 2023
Blip2 with a new LLM Intermediate	0	802	August 15, 2023
Incremental learning for image captioning 🤗Transformers	3	85	October 1, 2024
Text classification using BLIP2 Beginners	0	91	August 5, 2024

BLIP-2 - Should the image + language model be frozen by default?

Related topics