RAG custom dataset

nbroad · September 29, 2020, 12:52pm

I just saw that Facebook AI released a blog post about RAG ( Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models) and that it is already incorporated in the HuggingFace API.

I looked quickly, and I couldn’t see how to use a custom dataset with it. It seems like it will only pull down indexed datasets from HuggingFace’s AWS storage. I’m wondering if anyone can show me how to

Create an indexed dataset. I’m assuming this is just a big collection of embeddings that have been made by running documents through a model and taking the output embedding. I’m wondering which model(s) can be used, how many dimensions the embeddings are expected to be, and how to format all of these vectors.
Use that custom dataset with HF Rag models.

thomwolf · September 30, 2020, 12:31pm

Indeed, it’s actually very simple to do with datasets and somehow explained on this page: https://huggingface.co/docs/datasets/faiss_and_ea.html

We will add an example script on this.

LeroyDyer · July 23, 2024, 11:30am

today its a dead page

Topic		Replies	Views
How do we insert our own datasets in DPR / RAG retrieval Q&A models? 🤗Transformers	1	1640	October 11, 2020
Rag model set up 🤗Transformers	0	697	November 7, 2023
Using RAG with local documents Models	3	3676	April 21, 2021
Is it mandatory to fine tune a RAG model on custom dataset to generate realted responses for queries, i working on RAG code from the examples , when i use a custom datast=et it doesnot produce intended results for queries Beginners	0	230	September 5, 2023
Use RAGAS with huggingface LLM Intermediate	17	9717	March 17, 2025

RAG custom dataset

Related topics