Instruction tuning llm

I want to fine-tune a LLM with an instructions dataset, which consists of pairs of prompts and completions. I have seen a lot of tutorials on how to fine-tune LLMs with supervised datasets. Almost all of them use Trainer or SFTTrainer from Hugging Face.

The strange thing that shocked me is that there is no difference between this fine-tuning and the pretraining process; in both cases, the model tries to predict the next token for both the prompt and the completion.

Intuitively, I would prefer to backpropagate only the tokens of the completion and not the prompt itself. In fact, I believe the next token prediction should only start at the completion stage. Does that make sense?

Does anyone know of any library that can perform training as I expect?

Hi,

That’s supported in the TRL library using the DataCollatorForCompletionOnlyLM class: Supervised Fine-tuning Trainer

1 Like

The requirement you want may be needs to deal with in data preprocess procedure. To the best of my knowledge,you can manually replace prompts part in lable with -100 which is a special token that would be ignore loss calculation by torch backend(most third party llm finetune repo do things like this, like llama-recipes officially supported by llama or ‘llamafactory’ a very famous llm factory).

1 Like

Can you help me check how training data is generated when entering the “text” column of the dataset into the trainer() function? I mean test with code. thanks

@cungnlp this can be checked by doing trainer.get_train_dataloader. You can then check some samples of the dataloader:

train_dataloader = trainer.get_train_dataloader()

batch = next(iter(train_dataloader))
print(batch)

Thank you for your help!

Vào Th 4, 24 thg 1, 2024 vào lúc 04:41 Niels Rogge via Hugging Face Forums <notifications@hellohellohello.discoursemail.com> đã viết:

I checked, but it seems the dataset doesn’t look like “shift right one token”. Can you explain to me why? By the way, can you give me the code on how to add your own dataset with 3 columns: input_ids, attention_mask, label.
We look forward to receiving your feedback!

Vào Th 4, 24 thg 1, 2024 vào lúc 10:03 Nguyen Cung <cungmachinelearning@gmail.com> đã viết: