In the eval_rag.py file under examples, I notice that the arguments for index_name is “hf” or “legacy”. How are these different from “exact” vs “compressed”?
Hi! The last question was answered here : RAG Retriever : Exact vs. Compressed Index? - #4 by lhoestq
Hi Jung! I did see that post. Thats what created the confusion for me. The examples uses “hf” vs “legacy” but here we are mentioning “exact” vs “compressed”. If this could be clarified that would be great.
Hi @sashank06, sorry I misunderstood the question.
As far as I understand from the source code class LegacyIndex
refered to original index use in the RAG/DPR papers, while class HFIndexBase
allows us to use custom datasets (there’s dataset
argument).
Therefore, according to above quote, I understand “exact” and “compressed” are subtypes of “legacy”. I may misunderstood, and would be great if @lhoestq can help clarify here
Hi ! that’s a mistake in the eval_rag.py parameters choices. As specified in the rag configuration (see documentation), one can choose between ‘legacy’, ‘exact’ and ‘compressed’. The legacy index is the original index used for RAG/DPR while the other two use the datasets
library indexing implementation.
link to the PR that fixes the eval_rag.py parameter description: https://github.com/huggingface/transformers/pull/8730
Hi Quentin!
We saw your post saying exact index has to be used for the replication of the paper RAG Retriever : Exact vs. Compressed Index?
However, HuggingFace documentation says that legacy index replicates the paper’s results RAG — transformers 4.11.2 documentation
Also if Compressed index uses the same wiki dump but FAISS index is loaded in the RAM, why can this not be used to replicate the paper’s results.
While using legacy to load the pretrained RAG retriever, we faced “MemoryError: std::bad_alloc”.
Kindly help us resolve the confusion as we are trying to load the RAG Retriever’s wiki dump for text based question answering.
Also the wiki dump seems to be huge (140G) and we do not have space for that much storage right now.
Is there any text based wiki dump that helps us to replicate the paper’s result while being smaller than 140G?
Looking forward to your guidance and help. Great thanks for your time!
Regards
We saw your post saying exact index has to be used for the replication of the paper RAG Retriever : Exact vs. Compressed Index?
However, HuggingFace documentation says that legacy index replicates the paper’s results RAG — transformers 4.11.2 documentation
The compressed index has lower retrieval performance than the exact one. That’s why you need to use either the exact or the legacy one to replicate RAG’s performance.
While using legacy to load the pretrained RAG retriever, we faced “MemoryError: std::bad_alloc”.
The legacy index is 35GB and it needs to fit in RAM. Make sure you have enough RAM when using it.
Is there any text based wiki dump that helps us to replicate the paper’s result while being smaller than 140G?
You can use the wiki dump from the legacy index:
I think this one takes less disk space because the embeddings are stored quantized in the FAISS index (whereas the other indexes store the plain embeddings and it takes around 70GB to download and 70GB to convert to an Arrow dataset file). So you would have around 10GB + the faiss index (35GB).
Great thanks for your response, that’s extremely helpful! May I ask the RAM requirement for ‘exact’ and ‘compressed’ one also? Much appreciated.
It takes 35GB for the exact one and 3GB for the compressed one