Fine tune with SFTTrainer

I noticed that, according to the trainer’s documentation, when fine-tuning the model, I am required to provide a text field (trl/trl/trainer/ at 18a33ffcd3a576f809b6543a710e989333428bd3 · huggingface/trl · GitHub). However, this does not seem to be a supervised task!

Upon further examination, I observed that the model’s labels are the same as the input_ids, except they are shifted. This leads me to ask how this can be considered supervised learning. In my understanding, the prompt should serve as the input, and the completion should be the label. However, in this case, there are no distinct prompts and completions, only raw text.

Could you clarify what I am missing here?


So SFT (supervised fine-tuning) is called supervised since we’re collecting the data from humans. However we’re still training the model using the same cross-entropy loss as during pre-training (i.e. predicting the next token).

We now just make it more likely that the model will generate a useful completion given an instruction like “what are 10 things to do in London”, then the model should learn to generate “in London, you can visit (…)” for instance.

Since the model is still trained to predict the next token, we just concatenate the instruction and completion in a single “text” column, hence we can create the labels by shifting the inputs one position to the right (as is done during pre-training). One can then decide to only train the model on the completions, rather than the instructions, but the default SFTTrainer of TRL trains the model to predict both instructions and completions.


I have the same question as you, can you show me how to check how the dataset is created after putting the “text” field into traniner()?