What is `self.loss_function` in `forward()` of newly released LLM?

BasilHCN · January 14, 2025, 7:10am

I recently found that forward() of newly released LLM (such as Llama) replaced CrossEntropyLoss with a self.loss_function to calculate the next token prediction loss. However, the forward() of old language model such as GPT2 remains unchanged.

I wonder what is the difference? Thanks in advance!

Topic		Replies	Views
How can I know what loss function I am using? Beginners	1	88	May 25, 2025
Understanding Output of `PreTrainedModel.forward` Beginners	2	1950	February 12, 2024
How was self.loss_function implemented 🤗Transformers	4	35	June 9, 2025
SFTTrainer Loss function Beginners	2	4846	July 8, 2024
Reason for discrepancy between loss calculation in XLNetLMHeadModel and GPT2LMHeadModel 🤗Transformers	0	429	July 12, 2022

What is `self.loss_function` in `forward()` of newly released LLM?

Related topics