In RAG systems, who's really responsible for hallucination... the model, the retriever, or the data?

Pimpcat-AU · June 27, 2025, 12:09am

In my experience, a lot of these issues stem from junk data being pulled during the retrieval process. Even with the best generation models, if the context the model is working with is irrelevant, outdated, or poorly filtered, it can easily lead to hallucinated outputs. The key issue often isn’t the model itself, but rather the quality of the data being retrieved. If we improve the data retrieval process by ensuring that the retriever pulls only relevant, accurate, and up to date information we can significantly reduce the chances of hallucinations happening in the generation stage. Filtering and preprocessing the data before it reaches the model is a crucial step in improving output quality.

Topic		Replies	Views
Seeking Advice on Processing Support Conversations for Efficient RAG Model Search Intermediate	0	50	September 9, 2024
Should we distinguish between context shifts and true hallucinations in LLM outputs? Research	0	18	June 21, 2025
RAG Model for QA Models	1	1335	December 30, 2023
RAG Model performance does not match paper Models	0	332	February 5, 2021
How retrieval loss is calculated in RAG model? 🤗Transformers	0	354	February 14, 2024

In RAG systems, who's really responsible for hallucination... the model, the retriever, or the data?

Related topics