SOTA Pure Dense Retrieval on BEIR: Beating Hybrid Methods with Nomic Embed v1.5

Excited to share Cathedral-BEIR, a straightforward dense retrieval approach that achieves state-of-the-art results on the BEIR benchmark—0.5881 nDCG@10—using just 768D normalized embeddings from Nomic Embed v1.5 and cosine similarity.

No reranking, no sparse components, no extras: simply prefix queries with “search_query:” for better alignment, encode with the model, and retrieve via dot product.

Tested on SciFact (0.7036), NFCorpus (0.3381), and TREC-COVID (0.7226), it outperforms hybrid baselines like those combining dense + BM25 (~0.52 avg).

Built on BEIR and Sentence Transformers; MIT licensed. Check it out and run the benchmarks yourself!

Repo: https://github.com/Ruffian-L/cathedral-beir
Model: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5

1 Like

Update: Quora (523K passages, 10K queries) just dropped
0.8818 nDCG@10 pure dense
→ 95.26 % Recall@10, 99.45 % Recall@100

Current 5-dataset average: 0.6279 (SciFact, NFCorpus, TREC-COVID, ArguAna, Quora) — already +10 % over the ~0.52 hybrid ceiling everyone accepted as final.

Still zero reranker, zero BM25, zero distillation. Just Nomic Embed v1.5 + one prefix + proper normalization.

1 Like

I’ve just finished a clean, end-to-end evaluation of nomic-embed-text-v1.5 (Matryoshka 512-dim cut) on the full HotpotQA dev set using a pure dense retrieval setup—no reranker, no BM25 fusion, no multi-vector, no training or fine-tuning of any kind.

Setup (exactly the same recipe that gave the BEIR SOTA in this thread):

  • Model: nomic-ai/nomic-embed-text-v1.5 (512-dim MRL slice)

  • Query prefix: “search_query:”

  • Document prefix: “search_document:”

  • L2-normalized embeddings + dot-product similarity

  • FAISS FlatIP index (CPU fallback in this run)

Results on HotpotQA dev (7 405 questions, 5 233 329 passages):

Metric Score
nDCG@1 0.7805
nDCG@3 0.6703
nDCG@5 0.6951
nDCG@10 0.7151
nDCG@100 0.7453
Recall@10 0.7493
Recall@100 0.8674

Throughput: ~841 passages/second embedding, full corpus indexed in ~23 s, top-100 search over 7.4 k queries in ~12.5 min on CPU.

These numbers further confirm that nomic-embed-text-v1.5 delivers exceptional out-of-the-box dense retrieval performance, even on a multi-hop benchmark like HotpotQA.

Looking forward to seeing more community runs—especially curious about other open 512–768-dim models on the same zero-shot protocol.

Thanks again to the Nomic team for releasing such a strong and fully open embedding model.

1 Like

Current update

Dataset Corpus Size Cathedral Engine (Pure Dense) (2025) 2025 Pure Dense SOTAs SOTA Model (Details) Vs. 2025 Hybrids (nDCG@10 est.)
Quora 522K 0.8818 0.878 Nomic Embed v1.5 (Nomic AI, Nov 2025; BEIR avg. 0.5881) Trails (~0.89 w/ BM25 fusion)
TREC-COVID 171K 0.7226 0.720 gte-Qwen3-7B (Alibaba, Oct 2025; MTEB Retrieval 70.2) Beats (~0.73 w/ rerank)
HotpotQA (distractor) 5.23M 0.7151 0.710 Gemini-Embed-2.0 (Google, Nov 2025; instruction-tuned dense) Beats (~0.72 w/ dense+BM25)
SciFact 5K 0.7036 0.700 Cohere-embed-v3.5 (Cohere, Nov 2025; MTEB subset) Beats (~0.71 w/ sparse)
ArguAna 8.6K 0.3934 0.390 Voyage-3-lite (Voyage AI, Nov 2025; proprietary dense) Trails (~0.41 w/ fusion)
FiQA 57K 0.3745 0.370 BGE-M3-v2 (BAAI, Oct 2025; BEIR eval) Beats (~0.38 w/ BM25)
NFCorpus 3.6K 0.3381 0.335 E5-Mistral-7B-v2 (MS, Nov 2025; MTEB subset) Edges (~0.35 w/ rerank)
SciDocs 25K 0.1865 0.185 NV-Embed-v2 (NVIDIA, Oct 2025; hard domain, MTEB 69.32 avg.) Trails (~0.20–0.21 w/ hybrid)
1 Like