How to prevent catastrophic forgetting in fine tuned large language models?

mertoguzhan · January 10, 2025, 1:24pm

Hi there!
I decided to make a chatbot for my business. So, I am fine tuning a pretrained llma2 model for question-answering downstream task. However I got 2 issues:
1- When I fine tune the model, model works very well on my questions about my business. But when I ask a city from elsewhere, the model responds as if it were a project developed by my company in that city. I think he answers all the questions or questions with similar words as if they were about my company, even if they are not.

2- When I give the name of my projects to the model as special tokens, the model gives general questions and answers, but when I ask about my projects, it gives the wrong answer.

How can I handle this issue? Could you give me some advice?

Alanturner2 · January 10, 2025, 1:40pm

Hi @mertoguzhan !
You asked good question. Catastrophic forgetting is important problem. Catastrophic interference - Wikipedia
This is one of the issue in LLM field. There are many methods and I will introduce one method : Parameter-Isolation Approaches

Progressive Neural Networks (PNNs): Grow the network by adding new branches for each task while freezing the parameters of previous tasks.
- Paper: Rusu et al., “Progressive Neural Networks”
Dynamic Architecture Approaches: Dynamically allocate neurons or layers to tasks (e.g., PackNet, Piggyback).
But in my opinion, if you want to make the model to know about your company and the dataset not large, you should use RAG. You can make the model to answer the questions about your company using vector databases.

bird-of-paradise · January 11, 2025, 5:40am

The general techniques for preventing catastrophic forgetting are using smaller learning rate and regularization techniques to prevent too much of weight update.

I don’t see people talk about this but I like to freeze layers, especially the earlier ones. This technique was used in papers like BART.

system · January 11, 2025, 5:40pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding domain knowledge in LLMs via fine tuning Research	2	5582	July 23, 2023
How can LLMs be fine-tuned for specialized domain knowledge? 🤗Transformers	2	289	June 3, 2025
How to finetune/instruction-tune a large language model on a QA corpus? Intermediate	1	1927	January 20, 2024
They don't remember what you tell them... do they? Beginners	2	256	April 17, 2024
Fine-tuning conversational models with the technical documentation Beginners	2	1294	July 18, 2024

How to prevent catastrophic forgetting in fine tuned large language models?

Related topics