Blip-2 as a classification model

atomwalk12 · August 21, 2024, 6:29pm

I was wondering is it even possible to use the Blip-2 model (Blip2ForConditionalGeneration) for classification-like tasks. I have not been able to find any thorough information on how to use this model using a classification head.

Also, if the answer is yes, then which features should be extracted to train the classifier on. I can think of two possibilities:

Use the last_hidden_layer from the q-former and combine these features with the last_hidden_layer of the vision model; or
Use the pooled output of the q-former.

I feel like this is an interesting topic, which unfortunately was not able to find much information about.
Any related tips would be really appreciated. Thanks!

Topic		Replies	Views
Text classification using BLIP2 Beginners	0	91	August 5, 2024
Adapting BLIP2 for zero-shot classification 🤗Transformers	3	1480	August 8, 2024
Blip-2 for extraction of image and text embeddings 🤗Transformers	0	612	September 20, 2024
Blip2 with a new LLM Intermediate	0	802	August 15, 2023
What is the classification head doing exactly? 🤗Transformers	16	24483	November 4, 2024

Blip-2 as a classification model

Related topics