Vector search returns almost random results

Hello Dear embedding fans!

I’m starting my adventure with vector search.
My environment is SBERT where I tried two models to create vector embeddings;

sbert-base-cased-pl & paraphrase-multilingual-MiniLM-L12-v2

I encode phrases that are 300 - 800 characters long. Product descriptions in Polish language.

I loaded embeddings in DB and tried to search similar products (I encode search phrase as parameter with same model).
To my surprise search result is almost random. Looks like these models do not work at all. For instance I search for ‘Xero paper 80’ and I get similar product descriptions as… gloves (not even single word ‘paper’ or ‘xero’ there.

Is there something I should know?
I would appreciate any suggestion.

Regards,

G.

Can you describe your setup more precisely? I, too, have experienced frequent mistakes in similarity search with embeddings. Using better models can help.

Hi Matti,

simple code:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
emb = model.encode('some sentence ...')

Then I load this embedding in database and search using vector search options.
Parameter to search I get in exact same way:


search_param = model.encode('some similar search sentence ...')

I get way better result with FullText search.

For asynchronous search I tried ‘msmarco-MiniLM-L-6-v3’ model but results are also very poor.

Full text search works better if the search parameters exactly match the text to search for. Vector search can work better if the search parameters do not match the text exactly.