I’m using one of the hugging face models: sentence-transformers/all-MiniLM-L6-v2 for semantic search. Currently I’m facing trouble while searching for exact keywords. This is basically required when searching for the following:
a person’s name - John Davis
a specific id/number - 2023
a keyword containing special characters - Legal-Compliance, Year’23, $200, Q&A.
I have data of large lengths(more than 500 words) and so for embedding creation, the data is stripped into an array of sentences of length 100 each and then encoded and averaged.
# embedding_tokens - array of sentence tokens embedding = model.encode(embedding_tokens) embeddings = np.mean(embedding, axis=0)
These embeddings are then stored and searched using OpenSearch which currently is returning irrelevant results/less relevant results on the top
Can someone help me with this. Is this the correct way to combine/average-out the embeddings? How do I search numbers and keywords with special characters here?