How to finetune with a own private data and then build chatbot on that?

rjtshrm · July 30, 2023, 10:52pm

So far with the example of fine tuning I see examples of summarisation, chatbot based on specific use cases etc. However, I want to build the a chatbot based on my own private data (100s of PDF & word files). How can I fine tune on this. The approach I am thinking is
1-> LoRA fine tuning of the base alpaca model on my own private data
2-> LoRA fine tuning of the above model on some input output prompts.

Is it a good technique for build chatbot on private datasets. Please someone can suggest a good way of building model based on private data.

Saugatkafley · July 31, 2023, 3:55am

Hi, I found Abishek Thakur’s YT to be really helpful for fine tuning .He posts incredible stuffs related. In case for the video I followed , I took lesson form 1littlecoder' . Training Falcon-7B on colab.
Here link - (How-To Instruct Fine-Tuning Falcon-7B [Google Colab Included] - YouTube)

rjtshrm · July 31, 2023, 9:40am

@Saugatkafley , Thank you for your response. I have already experimented with this type of training, which involves prompt-based fine-tuning, and it has been effective for me.
To elaborate further, let’s consider a scenario where I possess private documents and wish to generate prompts based on that data. However, the language model lacks any knowledge about these specific documents. Even if I attempt to fine-tune the model using prompts demonstrated in video, it would likely miss out on crucial information present in the private documents. I want to have model knowledge of those documents as well.

vivi0 · July 31, 2023, 3:28pm

Have you considered using a QA model?

Prompt: Tell me about X.
QA model retrieves relevant text chunks based on 'Tell me about X'
Text chunks get put in a new prompt that then is fed to the LLM: f'{original prompt} using this context: {relevant_text_chunks}'

MitaliBante · February 16, 2024, 12:12am

Have you looked into RAGs?

Topic		Replies	Views
Fine-Tuning + RAG based Chatbot: Dataset Structure & Instruction Adherence Issues Intermediate	7	607	March 11, 2025
Fundamental newbie questions Beginners	1	1349	December 6, 2020
Finetuning GPT model multiple times 🤗Transformers	1	117	September 2, 2024
Although doing RAG does it worth fine tuning the LLM on the documents? - Llama2 Intermediate	1	1552	September 14, 2023
Fine-tuning conversational models with the technical documentation Beginners	2	1340	July 18, 2024

How to finetune with a own private data and then build chatbot on that?

Related topics