How does SFTT trainer behave during evaluation?

dkoterwa · October 23, 2024, 10:09am

I was wondering how does SFTT trainer handle the evaluation data with instructions that is passed to it.

Assume that we have a QA dataset and we format it in a way presented below.

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

Thanks to data collator we know that the loss will be only calculated for tokens appearing after the “### Answer” section of the instruction.

In the training paradigm, we use so called teacher forcing (using ground truth as input, instead of model output from a prior time step as an input.) which helps the loss to converge.

But is it the case in evaluation phase? Do we utilize ground truth as input instead of the generations of the model from previous steps? I guess yes, because without that evaluation loss would be really volatile, and I do not see this phenomenon. It would mean that we perform the same operations as on training set, but the update of the model’s parameters is not performed based on the performance on evaluation set.

Topic		Replies	Views
Whats happening in the SFT trainer? Beginners	13	2530	January 20, 2025
Evaluate model at saved checkpoint 🤗Transformers	0	1295	June 22, 2021
SFT Trainer and chat templates Beginners	3	382	March 26, 2025
SFTTrainer for Llama-2 Intermediate	0	91	August 3, 2024
Finetuning with SFTtrainer Intermediate	1	433	June 12, 2024

How does SFTT trainer behave during evaluation?

Related topics