Blip-2 for extraction of image and text embeddings

alzaia · September 20, 2024, 3:49pm

Hello, I was wondering if there is any way or examples that show how to extract text and image features from Blip-2 in the same embeddings space, ideally to be used for image-text matching. Or perhaps this model is not meant to perform this task? I can extract the text and image features, but they are not in the same space and do not have the same shape.

For example, in the original Blip-2 codebase, there is an example on how to use it for image-text matching, but it seems that this feature is not available in the HuggingFace version: LAVIS/examples/blip2_image_text_matching.ipynb at 3446bac20c5646d35ae383ebe6d13cec4f8b00cb · salesforce/LAVIS · GitHub

Topic		Replies	Views
Retrieve BlipForImageTextRetrieval image features 🤗Transformers	0	399	September 17, 2023
Embedding from BLIP2 Models	0	990	June 20, 2023
Adapting BLIP2 for zero-shot classification 🤗Transformers	3	1481	August 8, 2024
Feature extraction for image with a hosted model Beginners	6	1413	September 20, 2024
Support for different models in text-to-image pipeline 🤗Transformers	1	540	January 13, 2023

Blip-2 for extraction of image and text embeddings

Related topics