How do I use the RagRetriever to retrieve documents? (What is the question_hidden_states variable and how do make it?)

I have a Rag Model and Retriever using the facebook/rag-sequence-nq model, and I have a couple of questions about how to retrieve documents. I went ahead and built a model following this guide transformers/examples/research_projects/rag/use_own_knowledge_dataset.py at main · huggingface/transformers · GitHub - but I am at a bit of a loss as to how to get the documents used to generate the answer it gives me. It expects a question_hidden_states variable that I can’t figure out how to make, nor what exactly it is. Any advice and help would be much appreciated.

retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="custom", indexed_dataset=dataset)
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever)
model.config.output_hidden_states = True
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
question2 = "What are the best antidepressants for depression?"
tokenization = tokenizer.question_encoder(question2, return_tensors="pt")
generated = model.generate(tokenization.input_ids, output_hidden_states=True)
generated_string = tokenizer.batch_decode(generated, skip_special_tokens=True)[0]

print("Q: " + question2) # Q: What are the best antidepressants for depression?
print("A: " + generated_string) # A:  selective serotonin reuptake inhibitors

docs = retriever.retrieve(tokenization, n_docs=5) # Fails because it expects a question_hidden_states?

1 Like

Found an example RAG