I see https://github.com/huggingface/transformers/pull/5279 that describes the DPR flow.
Just checking to see when the retriever module will be available.
Many Thanks for making DPR available !
I see https://github.com/huggingface/transformers/pull/5279 that describes the DPR flow.
Just checking to see when the retriever module will be available.
Many Thanks for making DPR available !
I see this topic was already answered in Github from Quentin.
So, I’d love to add the answer here for convenience
The retriever is now part of the
nlp
library.
You can install it withpip install datasets
and load the retriever:
from datasets import load_dataset
wiki = load_dataset("wiki_dpr", with_embeddings=False, with_index=True, split="train")
The retriever is basically a dense index over wikipedia passages.
To query it using the DPR question encoder you can do:
from transformers import DPRQuestionEncoderTokenizer, DPRQuestionEncoder
question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question_encoder = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question = "What is love ?"
question_emb = question_encoder(**question_tokenizer(question, return_tensors="pt"))[0].detach().numpy()
passages_scores, passages = wiki.get_nearest_examples("embeddings", question_emb, k=20) # get k nearest neighbors