Problems with understanding instruction fine-tuning

vdw · April 2, 2024, 9:26am

I’m trying to read up on instruction fine-tuning, but I think I have a big misunderstanding.

As I understand, instruction datasets typically have 3 components: (a) the instruction (b) the output/response, and (c) and an optional input. Now, according to this paper: “Based on the collected IT dataset, a pretrained model can be directly fine-tuned in a fully-supervised manner, where given the instruction and the input, the model is trained by predicting each token in the output sequentially.” This makes sense to me, i.e, the response/output is the ground truth the model is expected to predict.

However, when I check many tutorials (e.g., this tutorial notebook), it seems that instructions, inputs, and outputs are all combined into a single text sample. But I can’t tell from the notebook how the training now works. What is now the ground truth for the supervised training. Or is this now treated as a next-word-prediction task?

What am I missing? Or are these indeed two different approaches for instruction tuning. Sorry if those a stupid questions!

Topic		Replies	Views
Using same instructions for fine-tuning: Is this bad for the model? Intermediate	1	457	March 26, 2024
Instruction fine tuning in plain pytorch Beginners	0	321	February 2, 2024
Finetuning on base or instruct model? Beginners	0	1700	April 6, 2024
Domain adaptation fine tune VS instruction_tuned 🤗Transformers	2	3106	January 21, 2024
Instruction tuning a pre-trained base model 🤗AutoTrain	0	46	December 18, 2024

Problems with understanding instruction fine-tuning

Related topics