Embedding from BLIP2

Looking for a code sample to get Embedding from BLIP2 model.

import torch
from PIL import Image
import requests
from transformers import AutoProcessor, Blip2Model

device = “cuda” if torch.cuda.is_available() else “cpu”

model = Blip2Model.from_pretrained(“Salesforce/blip2-opt-2.7b”, torch_dtype=torch.float16)

model.to(device)
processor = AutoProcessor.from_pretrained(“Salesforce/blip2-opt-2.7b”)
url = “http://images.cocodataset.org/val2017/000000039769.jpg
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors=“pt”).to(device, torch.float16)
image_outputs = model.get_image_features(**inputs)
image_outputs.pooler_output.cpu().tolist()[0]

==========================================
Does the above look right? What is the size of embedding expected.
Any help of how to get similar embedding size for BLIP2 Text?

2 Likes