SFTTrainer Loss function

I have a couple of questions:

  1. how to know the loss function used by default for SFTTrainer for a given model and how to alter it?

  2. For training an LLM the loss function is computed on the whole concatenated prompts, how to alter this and make loss function only compute on the output prompts

3 Likes

Hi,
I would recommend exploring DataCollatorForCompletionOnlyLM in HuggingFace for training LLM on outputs only!

The loss function being used is the cross-entropy loss. It is defined within the model, e.g. here for llama. In case you want to use your own custom loss function, you can overwrite the compute_loss method of the Trainer as explained here.

  1. For training an LLM the loss function is computed on the whole concatenated prompts, how to alter this and make loss function only compute on the output prompts

Indeed as recommended above, the DataCollatorForCompletionOnlyLM can be used for this purpose.

2 Likes