How to prevent catastrophic forgetting in fine tuned large language models?

Hi there!
I decided to make a chatbot for my business. So, I am fine tuning a pretrained llma2 model for question-answering downstream task. However I got 2 issues:
1- When I fine tune the model, model works very well on my questions about my business. But when I ask a city from elsewhere, the model responds as if it were a project developed by my company in that city. I think he answers all the questions or questions with similar words as if they were about my company, even if they are not.

2- When I give the name of my projects to the model as special tokens, the model gives general questions and answers, but when I ask about my projects, it gives the wrong answer.

How can I handle this issue? Could you give me some advice?

1 Like

Hi @mertoguzhan !
You asked good question. Catastrophic forgetting is important problem. Catastrophic interference - Wikipedia
This is one of the issue in LLM field. There are many methods and I will introduce one method : Parameter-Isolation Approaches

  • Progressive Neural Networks (PNNs): Grow the network by adding new branches for each task while freezing the parameters of previous tasks.
  • Dynamic Architecture Approaches: Dynamically allocate neurons or layers to tasks (e.g., PackNet, Piggyback).
    But in my opinion, if you want to make the model to know about your company and the dataset not large, you should use RAG. You can make the model to answer the questions about your company using vector databases.
1 Like

The general techniques for preventing catastrophic forgetting are using smaller learning rate and regularization techniques to prevent too much of weight update.

I don’t see people talk about this but I like to freeze layers, especially the earlier ones. This technique was used in papers like BART.

2 Likes

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.