DPR retriever module

madabhuc · August 7, 2020, 5:00pm

I see https://github.com/huggingface/transformers/pull/5279 that describes the DPR flow.

Just checking to see when the retriever module will be available.
Many Thanks for making DPR available !

Jung · November 6, 2020, 10:36pm

I see this topic was already answered in Github from Quentin.
So, I’d love to add the answer here for convenience

The retriever is now part of the nlp library.
You can install it with
pip install datasets
and load the retriever:

from datasets import load_dataset

wiki = load_dataset("wiki_dpr", with_embeddings=False, with_index=True, split="train")

The retriever is basically a dense index over wikipedia passages.
To query it using the DPR question encoder you can do:

from transformers import DPRQuestionEncoderTokenizer, DPRQuestionEncoder 
question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base') 
question_encoder = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base') 
question = "What is love ?" 

question_emb = question_encoder(**question_tokenizer(question, return_tensors="pt"))[0].detach().numpy() 

passages_scores, passages = wiki.get_nearest_examples("embeddings", question_emb, k=20) # get k nearest neighbors

Topic		Replies	Views
Trying RAG with other Retriever Models 🤗Transformers	0	427	January 21, 2021
Finetuning DPR on Custom Dataset 🤗Transformers	4	2876	April 5, 2024
How do I use the RagRetriever to retrieve documents? (What is the question_hidden_states variable and how do make it?) Beginners	1	626	March 18, 2024
Rag model set up 🤗Transformers	0	696	November 7, 2023
DPRQuestionEncoder 🤗Transformers	5	1005	August 5, 2020

DPR retriever module

Related topics