I see https://github.com/huggingface/transformers/pull/5279 that describes the DPR flow.
Just checking to see when the retriever module will be available.
Many Thanks for making DPR available !
I see https://github.com/huggingface/transformers/pull/5279 that describes the DPR flow.
Just checking to see when the retriever module will be available.
Many Thanks for making DPR available !
I see this topic was already answered in Github from Quentin.
So, I’d love to add the answer here for convenience ![]()
The retriever is now part of the
nlplibrary.
You can install it withpip install datasetsand load the retriever:
from datasets import load_dataset
wiki = load_dataset("wiki_dpr", with_embeddings=False, with_index=True, split="train")The retriever is basically a dense index over wikipedia passages.
To query it using the DPR question encoder you can do:
from transformers import DPRQuestionEncoderTokenizer, DPRQuestionEncoder
question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question_encoder = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question = "What is love ?"
question_emb = question_encoder(**question_tokenizer(question, return_tensors="pt"))[0].detach().numpy()
passages_scores, passages = wiki.get_nearest_examples("embeddings", question_emb, k=20) # get k nearest neighbors