I have a requirement that model should search for relevant documents to answer the query and I found RAG from Facebook AI which perfectly fits my usecase. I also found this post in which HuggingFace explains RAG and came to know that HF implemented RAG which is awesome!
My doubt is whether I could extend this functionality so that the model should do retrieval from local documents rather than from HF’s wikipedia corpus.
Are there any notebooks to refer to?
hey @saichandra, one possibility would be to use Haystack’s implementation of RAG (which is based on HF transformers), e.g. see here: https://haystack.deepset.ai/docs/latest/tutorial7md
one advantage of using Haystack is that they provide a nice API for FAISS (and other document stores) so you can store the embeddings locally with just a few lines of code.
i’ve had mixed results from using RAG with the Natural Questions checkpoints (the answers are often gibberish). if you’re doing QA on a specialised corpus, you might be better off using the classic Retriever-Reader architecture or fine-tuning RAG on your domain
You can easily do this by encoding your local KL. Please follow the use_own_knowledge_dataset.py
to add to what @shamanez said, i also recently learnt that you can add faiss indices directly in the
datasets library: Adding a FAISS or Elastic Search index to a Dataset — datasets 1.6.0 documentation