Fine-Tuning + RAG based Chatbot: Dataset Structure & Instruction Adherence Issues

leedahyeon · February 25, 2025, 8:05am

Fine-Tuning a Document-Based Chatbot – Issues and Questions

Hello everyone!
I am working on fine-tuning a chatbot that generates answers based on documents (RAG + Fine-tuning).
During the tuning process, I encountered several issues, and I would appreciate any insights or solutions from those with experience in this area.

Question 1: How should the dataset be structured for training a document-based chatbot?
When training a model to generate document-based answers,
Should I use a question-answer dataset?
Or should I build a question-document-answer dataset?

I’d love to know the common approach!

Question 2: Issues encountered after experimenting with two training methods

Training with a Question + Answer dataset
The responses were natural, but hallucination (incorrect information generation) occurred.
The model generated answers even when the provided document contained no relevant information.
To prevent this, I added the following instructions to the inference-time prompt:

“If the document does not contain relevant information for the question, respond with: ‘Sorry, I couldn’t find any relevant information.’”
“End the response with: ‘Thank you :):)’”
However, the model did not follow these instructions.

Here is the prompt I used:

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

I included multiple instructions in the Instruction section, but the model did not adhere to them.
Additionally, I did not include documents during training but added them only at inference time.

Training with a Question + Document + Answer dataset
When training with documents, the generated answers were strange and inconsistent.
In some cases, the model directly copied parts of the document instead of generating a proper response.
The documents I used for training were quite long—could this be the reason?

What is the standard approach for training a document-based chatbot?
For tasks involving document-based answer generation, how is training typically conducted?
Is there a better approach than what I have tried?

I would really appreciate any insights or advice!

#fine-tuning #llama #rag #instruction-tuning #hallucination #dataset-preparation #inference #prompt-engineering #large-language-models #document-based-chatbot

Topic		Replies	Views
How to finetune with a own private data and then build chatbot on that? 🤗Transformers	4	13882	February 16, 2024
Although doing RAG does it worth fine tuning the LLM on the documents? - Llama2 Intermediate	1	1544	September 14, 2023
Fine-tuning conversational models with the technical documentation Beginners	2	1332	July 18, 2024
Finetuning GPT model multiple times 🤗Transformers	1	114	September 2, 2024
Is it mandatory to fine tune a RAG model on custom dataset to generate realted responses for queries, i working on RAG code from the examples , when i use a custom datast=et it doesnot produce intended results for queries Beginners	0	230	September 5, 2023

Fine-Tuning + RAG based Chatbot: Dataset Structure & Instruction Adherence Issues

Related topics