SFTTrainer loss function and formatting_func

I have found the origin of the “apparent” model collapse:

1- the training was good BUT…

2- apparently the SFTTrainer lets the model in its training state. And in my notebook, I did not persist the model before running the evaluation, not even a model.eval().

Result: the model was still in training mode, with dropout and so on —> it was predicting the first token and even looped to it until max length reached.

The HF documentation could be updated to mention this SFTTrainer behavior if not already.

On the Google side we will update the colab material too because it is not indicated.

Thanks everyone for support and help :wink:

Jerome

1 Like