Retrieve BlipForImageTextRetrieval image features

Saarstriker · September 17, 2023, 11:34pm

Hey!

I am trying to get the image features/embeddings from BlipForImageTextRetrieval .

According to the docs:

image_embeds (torch.FloatTensor of shape (batch_size, output_dim) optional returned when model is initialized with with_projection=True) — The image embeddings obtained by applying the projection layer to the pooler_output

How can i set the with_projection to True, I can not find any information anywhere for this?
Is there any other way to extract the features?

How would i go on and extract the text_features (for a future use case)?

Thank you !

Edit: This is my current workaround

def get_vision_features(img_path, model, processor):
    vision_model = model.vision_model
    
    img = Image.open(img_path)
    processed_images = processor(img, return_tensors="pt")

    vision_embeddings = vision_model(pixel_values=processed_images.pixel_values, return_dict=True).pooler_output

    vision_features = model.vision_proj(vision_embeddings)

Topic		Replies	Views
Blip-2 for extraction of image and text embeddings 🤗Transformers	0	468	September 20, 2024
Adapting BLIP2 for zero-shot classification 🤗Transformers	3	1430	August 8, 2024
Embedding from BLIP2 Models	0	968	June 20, 2023
CLIPTextModel's get_text_features VS pooled outputs 🤗Transformers	1	390	August 30, 2024
Can BlipForImageTextRetrieval be used to generate captions? Models	3	858	September 14, 2023

Retrieve BlipForImageTextRetrieval image features

Related topics