What’s the best strategy for fine-tuning a large language model (LLM) on domain-specific data without catastrophic forgetting?

How can we fine-tune a large model for a specific domain while still keeping its general knowledge and skills?

1 Like

Methods include teaching while incorporating conventional knowledge during fine-tuning and excluding (freezing) certain parts of the network from fine-tuning, and these are often used in combination.