How to get an embedding of size 512 using CLIP equal to open_clip?

nielsr · February 19, 2024, 9:19pm

Hi,

Yes for that you need to load the CLIPVisionWithProjection class:

from PIL import Image
import requests
from transformers import AutoProcessor, CLIPVisionModelWithProjection

model = CLIPVisionModelWithProjection.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
processor = AutoProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=image, return_tensors="pt")

outputs = model(**inputs)
image_embeds = outputs.image_embeds

This class includes the projection layer (which projects the image embeddings into the same embedding space as the text embeddings).

Topic		Replies	Views
Embedding from BLIP2 Models	0	1006	June 20, 2023
Obtain patch embeddings with CLIP Beginners	0	65	November 25, 2024
Changing Hidden size in Clip Text encoder 🤗Transformers	0	264	February 22, 2024
Stable Diffusion CLIP similarity 🧨 Diffusers	6	4605	December 6, 2022
CLIPTextModel's get_text_features VS pooled outputs 🤗Transformers	1	545	August 30, 2024

How to get an embedding of size 512 using CLIP equal to open_clip?

Related topics