CLIP Image to Text search

johncash1990 · December 19, 2022, 3:34pm

Hi! I would like to use the CLIP model for image to text search (captioning).

Given an image I would like to retrieve the most similar text in the latent space I suppose, which is one that was already in the training data right? I wouldn’t be able to retrieve a generated text from the latent space for the specific image?

How would I go about it? I guess I would need to write a decoder for that?

Best wishes

Topic		Replies	Views
Image Captioning fine tuning 🤗Transformers	0	438	February 25, 2023
Use OpenAI's CLIP for image search 🤗 Course Projects	21	4335	June 4, 2024
Image to Text model that can take an additional text as input for context 🤗Hub	1	486	September 5, 2023
Image to text model that can take an additional text input 🤗Transformers	1	280	October 2, 2023
Image neural search 🤗 Course Projects	2	703	November 15, 2021

CLIP Image to Text search

Related topics