Using RAG with local documents

saichandra · April 6, 2021, 12:02pm

Hi,
I have a requirement that model should search for relevant documents to answer the query and I found RAG from Facebook AI which perfectly fits my usecase. I also found this post in which HuggingFace explains RAG and came to know that HF implemented RAG which is awesome!
My doubt is whether I could extend this functionality so that the model should do retrieval from local documents rather than from HF’s wikipedia corpus.
Are there any notebooks to refer to?

lewtun · April 6, 2021, 9:19pm

hey @saichandra, one possibility would be to use Haystack’s implementation of RAG (which is based on HF transformers), e.g. see here: https://haystack.deepset.ai/docs/latest/tutorial7md

one advantage of using Haystack is that they provide a nice API for FAISS (and other document stores) so you can store the embeddings locally with just a few lines of code.

i’ve had mixed results from using RAG with the Natural Questions checkpoints (the answers are often gibberish). if you’re doing QA on a specialised corpus, you might be better off using the classic Retriever-Reader architecture or fine-tuning RAG on your domain

shamanez · April 21, 2021, 1:00pm

You can easily do this by encoding your local KL. Please follow the use_own_knowledge_dataset.py

lewtun · April 21, 2021, 1:50pm

to add to what @shamanez said, i also recently learnt that you can add faiss indices directly in the datasets library: Adding a FAISS or Elastic Search index to a Dataset — datasets 1.6.0 documentation

Topic		Replies	Views
RAG custom dataset Models	2	2938	July 23, 2024
Uploading a locally saved embedding model Beginners	0	45	July 26, 2024
Rag model set up 🤗Transformers	0	697	November 7, 2023
Is there a way to download embedding model files and load from local folder which supports langchain vectorstore embeddings Beginners	1	2015	November 29, 2023
Best free options if you want to train a language model on a small set of private documents? Beginners	3	462	April 5, 2024

Using RAG with local documents

Related topics