Adapting BLIP2 for zero-shot classification

atomwalk12 · August 8, 2024, 8:13am

Were you able to solve the task? I noticed that you are using a slightly different approach with respect to [1].
In the previous post, the output field qformer_outputs.last_hidden_state is used to synthesis the information from the qformer using the Blip2ForConditionalGeneration class. Your approach seems to be using Blip2Model.

As far as my understanding goes, the q-former already makes use of the vision model to generate its output. Could anyone with more experience explain which of these two methods is more effective?

Topic	Replies	Views
Text classification using BLIP2 Beginners	88	August 5, 2024
Blip-2 as a classification model Models	133	August 21, 2024
Embedding from BLIP2 Models	989	June 20, 2023
Blip-2 for extraction of image and text embeddings 🤗Transformers	564	September 20, 2024
Improving semantic search with zero shot image classification Beginners	193	April 17, 2024

Adapting BLIP2 for zero-shot classification

Related topics