TL;DR: I want to fine-tune ‘meta-llama/Llama-2-7b-chat-hf’ with my own instructional data, but it turns out that the new model is worse than the original one.
How should I differentiate in the code between different types of fine-tuning? I saw different types of fine-tuning on the internet:
Domain adaption
Instruction-based fine-tuning (it looks like this is what is needed)
Supervised fine-tuning …
But how should it be different in the code? For example, assuming that I use AutoModelForCausalLM and a trainer object, a simple code would look like this:
Essentially, what trainer.train() does is attempt to predict the next token. Therefore, when you provide to the trainer a train_dataset or eval_dataset that consists of a list with a token or string in each element, the goal is invariably to predict the subsequent token.
From this perspective, there is no distinction between Domain Adaptation and Instruction-based Fine-tuning – the difference lies solely in the data.
In Domain Adaptation, the data usually takes the form of a single, very long text that has been concatenated. The objective remains next-token prediction, where the context is provided in windows that typically have a specified context length parameter (e.g., 2K, 4K, …). This stage often encompasses the bulk of the learning due to the extensive size of the data.
When it comes to Instruction Tuning, it usually involves a few thousand pairs of prompts and completions aimed at teaching the model how to respond to instructions. However, generally, there is no difference in the learning method – the model always attempts to predict the next token, whether it is part of the prompt or the completion.
Please note that you can optionally configure the model to learn only from the completion part by using the SFTTrainer from TRL instead of the regular Trainer. Essentially, SFTTrainer is designed to override Trainer.
For more information, see Train on Completions Only.