I see this topic was already answered in Github from Quentin.
So, I’d love to add the answer here for convenience
The retriever is now part of the
nlp
library.
You can install it withpip install datasets
and load the retriever:
from datasets import load_dataset
wiki = load_dataset("wiki_dpr", with_embeddings=False, with_index=True, split="train")
The retriever is basically a dense index over wikipedia passages.
To query it using the DPR question encoder you can do:
from transformers import DPRQuestionEncoderTokenizer, DPRQuestionEncoder
question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question_encoder = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question = "What is love ?"
question_emb = question_encoder(**question_tokenizer(question, return_tensors="pt"))[0].detach().numpy()
passages_scores, passages = wiki.get_nearest_examples("embeddings", question_emb, k=20) # get k nearest neighbors