RAG Retriever : Exact vs. Compressed Index?

Hi Guys,

With the command
retriever = RagRetriever.from_pretrained("facebook/rag-token-base", index_name="compressed", use_dummy_dataset=True)

We can choose index_name either “compressed” or “exact”, what is the difference between these two ?

I also found from some topic (could not find it now), that @lhoestq suggested to use “compressed” index to match the performance of the paper; why is that the case ?
Query: @lhoestq

Hi ! exact vs compressed refers to the quantization used for the FAISS index. The compressed one uses an IVF index with product quantization and requires significantly less RAM than the exact one. To reproduce the RAG papers result you will need the exact one though.
Note that I will update this week the parameters of both index so that the exact one uses the same as RAG’s paper, and also to have an optimized compressed one.

1 Like

Thanks Quentin for the reply!

I have one question:
when I tried to download the “full” version of compressed index, it’s still download 79GB of wikipedia document. Currently, I use only Colab with dummy index, but I plan to rent some cloud machine to test the full scale index too. Could you please suggest me how much RAM I would need for the full compressed index ?? (and maybe some extra RAM in finetuning ?) @lhoestq

The compressed index is around 3GB so you don’t need that much RAM actually. The index is the structure used to do retrieval, it doesn’t store the wikipedia texts.

On the other hand the 79GB of data correspond to all the 21M wikipedia texts and their corresponding 768-dim representations. Those data are not loaded in RAM, they’re just memory-mapped from the disk. Memory mapping allows to have fast I/O without filling the RAM.

1 Like