Improving Sentence Embeddings

Bdg01 · April 14, 2025, 11:50pm

Hi, all

Has anyone had success improving sentence embeddings for similarity search in a RAG setup?

I’ve experimented with the following approaches:

Removing stopwords based on TF-IDF and doing lower strings, lemmatising etc…
Filtering out low mutual information and low-entropy words
Using various sentence embedding models (e.g. all-MiniLM, MPNet)
Finetuning sentence embeddings via Siamese and Triplet networks with Multiple Negatives Ranking (MNR). Would this work with different sampling methods such as Hard Negative Sampling or different methods? And any tips on improving the sentence embedding through different training methods??

Despite these efforts, I’ve seen minimal improvement in retrieval performance.
Would love to hear if anyone has a solid workflow or other suggestions that worked well for them!

I’m only hoping to try non-API based embeddings since I don’t want to rely on API calls in later stages of production.

Thanks in advance!

Topic		Replies	Views
RAG Embeddings: German language Beginners	10	6608	May 23, 2024
Seeking Advice on Processing Support Conversations for Efficient RAG Model Search Intermediate	0	50	September 9, 2024
Train the Best Sentence Embedding Model Ever with 1B Training Pairs Flax/JAX Projects	36	25480	July 2, 2023
Low Dim Embeddings from Similarity Transformer Models Beginners	1	643	April 5, 2024
Why can't I find a better model? Models	1	108	April 25, 2024

Improving Sentence Embeddings

Related topics