Transfomers.js - image similarity on portion of image

danicimi · June 7, 2024, 3:22pm

Hi everyone,
I’m trying to build a proof of concept for a visual annotation tool: let’s say I have a collection of images (think of paintings) and I want to find the occurrence in the dataset of a visual detail (e.g. the signature of a particular artist). I don’t need to localize it on the pictures, I just want to retrieve a list of images having a similar detail in it.

I treated the problem as an image similarity task and with a very naive approach I used the ImageFeatureExtractionPipeline and the entire resulting Tensor (which I guess are all the hidden states?) from each image in my dataset as an embedding.

I saved those on filesystem, then cropped a detail on one of the images and used it as a test: I just computed the embedding in the same way and then computed the cosine similarity against each previous entry.

Results are pretty varied based on the input and generally not that satisfying. The model that achieved the best results was vit-base-patch16-224-in21k, I also tried with Dinov2-base but it seems to be less performant.

Is there any other approach I could consider? Is a CNN like ResNet more suited for this kind of task? Should I focus on some other model?

Any suggestion is welcome,
thanks!

Topic		Replies	Views
Image similarity Intermediate	2	2444	March 31, 2023
Idea: Iterative Residual Embeddings for Complex Image Understanding Research	0	14	May 21, 2025
Predicting images based on a sentence, the unsupervised way Beginners	0	337	June 3, 2022
Image neural search 🤗 Course Projects	2	702	November 15, 2021
Sort Images by Similarity Using Computer Vision Beginners	6	540	October 10, 2024

Transfomers.js - image similarity on portion of image

Related topics