RAG Retriever: hf vs legacy vs exact vs compressed

lhoestq · October 6, 2021, 9:23am

We saw your post saying exact index has to be used for the replication of the paper RAG Retriever : Exact vs. Compressed Index?
However, HuggingFace documentation says that legacy index replicates the paper’s results RAG — transformers 4.11.2 documentation

The compressed index has lower retrieval performance than the exact one. That’s why you need to use either the exact or the legacy one to replicate RAG’s performance.

While using legacy to load the pretrained RAG retriever, we faced “MemoryError: std::bad_alloc”.

The legacy index is 35GB and it needs to fit in RAM. Make sure you have enough RAM when using it.

Is there any text based wiki dump that helps us to replicate the paper’s result while being smaller than 140G?

You can use the wiki dump from the legacy index:
I think this one takes less disk space because the embeddings are stored quantized in the FAISS index (whereas the other indexes store the plain embeddings and it takes around 70GB to download and 70GB to convert to an Arrow dataset file). So you would have around 10GB + the faiss index (35GB).

Topic		Replies	Views
RAG Retriever : Exact vs. Compressed Index? Models	3	1104	November 10, 2020
Poor Results with FAISS Index on RAG System 🤗Transformers	0	611	March 13, 2024
Using RAG with local documents Models	3	3668	April 21, 2021
Facing issue building a simple RAG application using RetrievalQA Beginners	2	64	May 30, 2025
RAG Model performance does not match paper Models	0	332	February 5, 2021

RAG Retriever: hf vs legacy vs exact vs compressed

Related topics