Searching for exact keyword using sbert models

ankita3 · June 19, 2023, 10:34am

I’m using one of the hugging face models: sentence-transformers/all-MiniLM-L6-v2 for semantic search. Currently I’m facing trouble while searching for exact keywords. This is basically required when searching for the following:
a person’s name - John Davis
a specific id/number - 2023
a keyword containing special characters - Legal-Compliance, Year’23, $200, Q&A.

I have data of large lengths(more than 500 words) and so for embedding creation, the data is stripped into an array of sentences of length 100 each and then encoded and averaged.

# embedding_tokens - array of sentence tokens
embedding = model.encode(embedding_tokens)
embeddings = np.mean(embedding, axis=0)

These embeddings are then stored and searched using OpenSearch which currently is returning irrelevant results/less relevant results on the top

Can someone help me with this. Is this the correct way to combine/average-out the embeddings? How do I search numbers and keywords with special characters here?

jigargajjar · July 26, 2024, 7:14pm

i am looking for very similar solution, let me know if you find any solution.

Topic		Replies	Views
Vector search returns almost random results Models	3	477	February 10, 2024
Feature request: more flexible search in model / dataset hub Site Feedback	4	1615	September 27, 2022
Good keyword extraction models? Beginners	1	3709	July 14, 2022
Extracting embedding values of NLP pertained models from tokenized strings 🤗Tokenizers	3	2221	August 18, 2021
Models for Multi-lingual Embeddings (similarity search)? Models	0	1479	August 26, 2023

Searching for exact keyword using sbert models

Related topics