Huggingface datasets, faiss, sbert and cosine similarity

JustSaX · January 1, 2023, 5:08pm

Hi

I’d like to setup a questions answering system using a pretrained sbert bi-encoder model. Can I use the huggingface datasets faiss functionality to compare the question vector with my encoded corpus? To my understanding sbert uses cosine similarity and faiss the dot product for vector similarity and I guess they are not compatible.

Or is there a better option in the huggingface datasets that I should use?

Thanks in advance!

lhoestq · January 3, 2023, 10:43am

You can use FAISS with the inner product metric. You just have to standardize your vectors first to make sure that the inner product is equal to the cosine similarity

Topic		Replies	Views
How to find the closest matching sentence using sentence transformer and faiss? Beginners	1	1221	July 28, 2022
Dataset curation extra parameters Beginners	2	31	January 19, 2025
How to obtain similarity values from embeddings? Beginners	2	427	April 29, 2022
How to create a dataset for "audio-like" files for ASR Beginners	0	402	April 10, 2023
Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering 🤗Transformers	19	12849	February 12, 2024

Huggingface datasets, faiss, sbert and cosine similarity

Related topics