I am very new in the deep learning and currently training a T5 model available here Einmalumdiewelt/T5-Base_GNAD · Hugging Face.
It aims to summarize texts.
I have read that the T5 models by default have a cross entropy function, but for specific tasks such as generating texts it must be negative log likehood loss function. There is no documentation about it and I am struggling to know which one I am using !
Do you have any idea about what this loss function could be ?
Thanks by advance !
Cross entropy loss and negative log-likelihood loss are almost the same thing conceptually and are often used interchangeably. The practical difference is just whether or not your model has a
LogSoftmax layer at the end (to convert logits to log probs) or whether you do that inside the loss function itself.
In PyTorch, CrossEntropyLoss is equal to
NLLLoss, so you should use CrossEntropyLoss if the model outputs have not been converted into log probs.
On the other hand, NLLLoss assumes the model outputs have already been fed through a LogSoftmax layer. As the docs say:
Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.
In the Huggingface T5 implementation, the logits are not passed through a
LogSoftmax at the end (eg see here), so when using Huggingface you want to just use CrossEntropyLoss.