RAG: Embedding models have converged

umidamurat-05 · November 17, 2025, 11:17am

As someone new in the RAG world, I wanted to know which embedding model is actually the best. There are lots of leaderboards out there, but none of them guarantee the same results on your own dataset. So I tested it myself.

I took:

8 datasets (2 private, 2 multilingual, 4 public)
13 popular embedding models
logged latency and accuracy
and calculated an ELO score by letting an LLM judge which model retrieved the better top-5 list

What I expected was a clear separation. But what I got was the opposite.

~85% of models fall in the same narrow 50-ELO range
The top 4 models are only ~23.5 ELO points apart
Rank 1 → rank 10 is roughly a 3% difference

The gaps are so small that, in practice, many of these models behave almost the same.

When I looked into why, it made sense: they’re all trained to solve the same narrow task, on similar data, with similar objectives. Naturally, they end up in the same performance range.

So, what I got from this experiment was that choosing the “perfect” embedding model isn’t a big decision anymore. Maybe the real difference comes from the other parts of the pipeline.

If you want to dive deeper into actual numbers, here is the full breakdown.

Topic		Replies	Views
RAG performance on WebQuestion dataset lower than expected Models	0	300	November 18, 2022
RAG Model performance does not match paper Models	0	342	February 5, 2021
how to evaluate RAG-end2end Models	0	446	June 29, 2021
RAG on HF Inference for Pros - using Llama 2 + Llama 2 embeddings model Models	0	1081	October 28, 2023
Debugging the RAG question encoder Research	2	598	February 10, 2021

RAG: Embedding models have converged

Related topics