RAG Retriever: hf vs legacy vs exact vs compressed

In the eval_rag.py file under examples, I notice that the arguments for index_name is “hf” or “legacy”. How are these different from “exact” vs “compressed”?

Hi! The last question was answered here : RAG Retriever : Exact vs. Compressed Index? - #4 by lhoestq

Hi Jung! I did see that post. Thats what created the confusion for me. The examples uses “hf” vs “legacy” but here we are mentioning “exact” vs “compressed”. If this could be clarified that would be great.

1 Like

Hi @sashank06, sorry I misunderstood the question.
As far as I understand from the source code class LegacyIndex refered to original index use in the RAG/DPR papers, while class HFIndexBase allows us to use custom datasets (there’s dataset argument).

Therefore, according to above quote, I understand “exact” and “compressed” are subtypes of “legacy”. I may misunderstood, and would be great if @lhoestq can help clarify here :smiley:

1 Like

Hi ! that’s a mistake in the eval_rag.py parameters choices. As specified in the rag configuration (see documentation), one can choose between ‘legacy’, ‘exact’ and ‘compressed’. The legacy index is the original index used for RAG/DPR while the other two use the datasets library indexing implementation.

link to the PR that fixes the eval_rag.py parameter description: https://github.com/huggingface/transformers/pull/8730

1 Like

Hi Quentin!

We saw your post saying exact index has to be used for the replication of the paper RAG Retriever : Exact vs. Compressed Index?

However, HuggingFace documentation says that legacy index replicates the paper’s results RAG — transformers 4.11.2 documentation

Also if Compressed index uses the same wiki dump but FAISS index is loaded in the RAM, why can this not be used to replicate the paper’s results.

While using legacy to load the pretrained RAG retriever, we faced “MemoryError: std::bad_alloc”.

Kindly help us resolve the confusion as we are trying to load the RAG Retriever’s wiki dump for text based question answering.

Also the wiki dump seems to be huge (140G) and we do not have space for that much storage right now.

Is there any text based wiki dump that helps us to replicate the paper’s result while being smaller than 140G?

Looking forward to your guidance and help. Great thanks for your time!

Regards

We saw your post saying exact index has to be used for the replication of the paper RAG Retriever : Exact vs. Compressed Index?
However, HuggingFace documentation says that legacy index replicates the paper’s results RAG — transformers 4.11.2 documentation

The compressed index has lower retrieval performance than the exact one. That’s why you need to use either the exact or the legacy one to replicate RAG’s performance.

While using legacy to load the pretrained RAG retriever, we faced “MemoryError: std::bad_alloc”.

The legacy index is 35GB and it needs to fit in RAM. Make sure you have enough RAM when using it.

Is there any text based wiki dump that helps us to replicate the paper’s result while being smaller than 140G?

You can use the wiki dump from the legacy index:
I think this one takes less disk space because the embeddings are stored quantized in the FAISS index (whereas the other indexes store the plain embeddings and it takes around 70GB to download and 70GB to convert to an Arrow dataset file). So you would have around 10GB + the faiss index (35GB).

Great thanks for your response, that’s extremely helpful! May I ask the RAM requirement for ‘exact’ and ‘compressed’ one also? Much appreciated.

It takes 35GB for the exact one and 3GB for the compressed one :slight_smile: