Rag model set up

Yeshwnath · November 7, 2023, 11:49pm

Hi,

Thank you for the new version of the rag model; it’s really nice. I’ve tried to install the required packages and set everything up using the code snippet below

from transformers import AutoTokenizer, RagRetriever, RagModel
import torch

tokenizer = AutoTokenizer.from_pretrained(“facebook/rag-token-base”)
retriever = RagRetriever.from_pretrained(“facebook/rag-token-base”, index_name=“exact”)
model = RagModel.from_pretrained(“facebook/rag-token-base”, retriever=retriever)
inputs = tokenizer(“what is physics?”, return_tensors=“pt”)
outputs = model(input_ids=inputs[“input_ids”])
print(outputs)

It has downloaded all the required packages, which total 80 GB in size. Afterward, it initiated the indexing process for ‘wiki_dpr.’ which is total of 75 GB in size. My question is whether this indexing step will occur every time we run the code, and also, the ‘wiki_dpr’ data is stored in my local cache folder. How can I load this data into the retriever? If we want to load data from datasets how we can load small dataset, noticed that above code is going for full indexing every time we run and it is taking 12+ hr in windows machine could you please provide guidance on how to set up the RAG model?

Thank you in advance.

Topic		Replies	Views
Trying RAG with other Retriever Models 🤗Transformers	0	428	January 21, 2021
How do I use the RagRetriever to retrieve documents? (What is the question_hidden_states variable and how do make it?) Beginners	1	632	March 18, 2024
RAG isnt working as expected Beginners	3	227	May 2, 2024
Trouble saving and loading a finetuned model Beginners	1	310	July 7, 2024
RAG (DPR+seq2seq) pre-trained example Models	0	399	December 12, 2022

Rag model set up

Related topics