Vector search returns almost random results

Gosforth · January 18, 2024, 6:52pm

Hello Dear embedding fans!

I’m starting my adventure with vector search.
My environment is SBERT where I tried two models to create vector embeddings;

sbert-base-cased-pl & paraphrase-multilingual-MiniLM-L12-v2

I encode phrases that are 300 - 800 characters long. Product descriptions in Polish language.

I loaded embeddings in DB and tried to search similar products (I encode search phrase as parameter with same model).
To my surprise search result is almost random. Looks like these models do not work at all. For instance I search for ‘Xero paper 80’ and I get similar product descriptions as… gloves (not even single word ‘paper’ or ‘xero’ there.

Is there something I should know?
I would appreciate any suggestion.

Regards,

G.

MattiLinnanvuori · January 20, 2024, 10:16am

Can you describe your setup more precisely? I, too, have experienced frequent mistakes in similarity search with embeddings. Using better models can help.

Gosforth · February 10, 2024, 3:34pm

Hi Matti,

simple code:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
emb = model.encode('some sentence ...')

Then I load this embedding in database and search using vector search options.
Parameter to search I get in exact same way:


search_param = model.encode('some similar search sentence ...')

I get way better result with FullText search.

For asynchronous search I tried ‘msmarco-MiniLM-L-6-v3’ model but results are also very poor.

MattiLinnanvuori · February 10, 2024, 3:48pm

Full text search works better if the search parameters exactly match the text to search for. Vector search can work better if the search parameters do not match the text exactly.

Topic		Replies	Views
Models for Multi-lingual Embeddings (similarity search)? Models	0	1484	August 26, 2023
Using LLMs word embeddings within context Models	2	1180	January 25, 2024
Searching for exact keyword using sbert models Models	1	806	July 26, 2024
Embedding which takes in account order of words 🤗Transformers	0	70	May 29, 2024
Why can't I find a better model? Models	1	108	April 25, 2024

Vector search returns almost random results

Related topics