I just saw that Facebook AI released a blog post about RAG ( Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models) and that it is already incorporated in the HuggingFace API.
I looked quickly, and I couldn’t see how to use a custom dataset with it. It seems like it will only pull down indexed datasets from HuggingFace’s AWS storage. I’m wondering if anyone can show me how to
- Create an indexed dataset. I’m assuming this is just a big collection of embeddings that have been made by running documents through a model and taking the output embedding. I’m wondering which model(s) can be used, how many dimensions the embeddings are expected to be, and how to format all of these vectors.
- Use that custom dataset with HF Rag models.