Fine tune with SFTTrainer

nielsr · December 29, 2023, 11:06am

Hi,

So SFT (supervised fine-tuning) is called supervised since we’re collecting the data from humans. However we’re still training the model using the same cross-entropy loss as during pre-training (i.e. predicting the next token).

We now just make it more likely that the model will generate a useful completion given an instruction like “what are 10 things to do in London”, then the model should learn to generate “in London, you can visit (…)” for instance.

Since the model is still trained to predict the next token, we just concatenate the instruction and completion in a single “text” column, hence we can create the labels by shifting the inputs one position to the right (as is done during pre-training). One can then decide to only train the model on the completions, rather than the instructions, but the default SFTTrainer of TRL trains the model to predict both instructions and completions.

Topic		Replies	Views
Instruction tuning llm Beginners	8	12324	May 8, 2024
Finetuning with SFTtrainer Intermediate	1	430	June 12, 2024
Fine-tuning queries Beginners	0	39	February 20, 2025
[LMM Fine Tuning] Supervised Fine Tuning Trainer (SFTTrainer) vs transformers Trainer Intermediate	1	1672	November 29, 2023
SFT Trainer and chat templates Beginners	3	356	March 26, 2025

Fine tune with SFTTrainer

Related topics