What is the loss function of a pre-trained T5 model?

Striker67 · June 17, 2023, 6:10pm

Hello everyone,

I am very new in the deep learning and currently training a T5 model available here Einmalumdiewelt/T5-Base_GNAD · Hugging Face.
It aims to summarize texts.
I have read that the T5 models by default have a cross entropy function, but for specific tasks such as generating texts it must be negative log likehood loss function. There is no documentation about it and I am struggling to know which one I am using !

Do you have any idea about what this loss function could be ?

Thanks by advance !

dblakely · June 19, 2023, 4:53pm

Cross entropy loss and negative log-likelihood loss are almost the same thing conceptually and are often used interchangeably. The practical difference is just whether or not your model has a LogSoftmax layer at the end (to convert logits to log probs) or whether you do that inside the loss function itself.

In PyTorch, CrossEntropyLoss is equal to LogSoftmax + NLLLoss, so you should use CrossEntropyLoss if the model outputs have not been converted into log probs.

On the other hand, NLLLoss assumes the model outputs have already been fed through a LogSoftmax layer. As the docs say:

Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.

In the Huggingface T5 implementation, the logits are not passed through a LogSoftmax at the end (eg see here), so when using Huggingface you want to just use CrossEntropyLoss.

Topic		Replies	Views
Do I need to apply the softmax function to my logit before calculating the CrossEntropyLoss? 🤗Transformers	1	3237	October 15, 2020
Cross Entropy Loss and loss of HuggingFace T5ForConditionalGeneration does not matches 🤗Transformers	11	5284	November 29, 2023
Cross Entropy Weighted Beginners	12	7741	June 30, 2023
What is loss function for T5 Models	13	12882	February 25, 2024
Negative "cross entropy" loss function 🤗Transformers	0	1539	December 15, 2022

What is the loss function of a pre-trained T5 model?

Related topics