Fine tune with SFTTrainer


So SFT (supervised fine-tuning) is called supervised since we’re collecting the data from humans. However we’re still training the model using the same cross-entropy loss as during pre-training (i.e. predicting the next token).

We now just make it more likely that the model will generate a useful completion given an instruction like “what are 10 things to do in London”, then the model should learn to generate “in London, you can visit (…)” for instance.

Since the model is still trained to predict the next token, we just concatenate the instruction and completion in a single “text” column, hence we can create the labels by shifting the inputs one position to the right (as is done during pre-training). One can then decide to only train the model on the completions, rather than the instructions, but the default SFTTrainer of TRL trains the model to predict both instructions and completions.