I want to fine-tune a LLM with an instructions dataset, which consists of pairs of prompts and completions. I have seen a lot of tutorials on how to fine-tune LLMs with supervised datasets. Almost all of them use Trainer or SFTTrainer from Hugging Face.
The strange thing that shocked me is that there is no difference between this fine-tuning and the pretraining process; in both cases, the model tries to predict the next token for both the prompt and the completion.
Intuitively, I would prefer to backpropagate only the tokens of the completion and not the prompt itself. In fact, I believe the next token prediction should only start at the completion stage. Does that make sense?
Does anyone know of any library that can perform training as I expect?
The requirement you want may be needs to deal with in data preprocess procedure. To the best of my knowledge,you can manually replace prompts part in lable with -100 which is a special token that would be ignore loss calculation by torch backend(most third party llm finetune repo do things like this, like llama-recipes officially supported by llama or ‘llamafactory’ a very famous llm factory).
I checked, but it seems the dataset doesn’t look like “shift right one token”. Can you explain to me why? By the way, can you give me the code on how to add your own dataset with 3 columns: input_ids, attention_mask, label.
We look forward to receiving your feedback!