Using an image's text and image's embedding from clip with FAISS

melampus · November 20, 2023, 1:15pm

I have a dataset with 50K images, each image has a text description associated with it. I want to use each image’s text and image in a semantic search database such as FAISS.

I have been able to use CLIP to embed either each image or each text description. However, given that the text descriptions should aid in classification I am wondering if there is a way to put an image’s text and imagery embedding into the same embedding? Is simply combining the two embeddings a possible solution?

import faiss                   
index = faiss.IndexFlatL2(1024) #    
          
image_embedding = get_image_embedding(clip_model)
text_embedding = get_text_embedding(clip_model)
combined_embedding = np.concatenate((text_embedding, image_embedding), axis=1)
index.add(combined_embedding)

would be a better approach be two maintain two sperate index’s - one for text and one for imagery - and then take the union of their search results?

melampus · November 20, 2023, 1:38pm

somewhat unrelated to transformers but I found FAISS does support combining search results from multiple index’s via a function called ResultHeap

panigrah · November 21, 2023, 1:32pm

There is this paper that suggests a way. They are just concatenating the two vectors together.

Depending on the vector sizes you have, you can look at ways to reduce the size if needed. There is a few different approaches. I haven’t done this though.

Topic		Replies	Views
BLIP How to combine embeddings for multimodal search? Intermediate	1	1984	January 11, 2024
How to optimize performance of CLIP when searching 10_000 images Beginners	2	1939	October 15, 2022
How to combine Image and Text embedding for product similarity Models	2	16882	May 6, 2025
How is additional text information used for image classification using CLIP? Beginners	0	450	November 5, 2023
Similarity search with combined image and text? Research	6	3152	June 24, 2022

Using an image's text and image's embedding from clip with FAISS

Related topics