Encoder Decoder Loss

valhalla · March 16, 2021, 10:26am

padding tokens in the labels should be replaced by -100 so the cross_entriopy loss ignores the pad tokens when computing the loss.

and the loss is actually computed like this

shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous() # prediction_scores is logits
labels = labels[:, 1:].contiguous()
loss_fct = CrossEntropyLoss()
lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

Topic		Replies	Views
Understanding the encoder-decoder loss calculation VS CLM loss Beginners	0	346	February 21, 2024
Seq2Seq Loss computation in Trainer Beginners	9	6039	October 28, 2021
Does the transformer automatically shift by one position when calculating the autoregressive loss during the forward pass? Beginners	1	28	March 20, 2025
Getting CrossEntropy loss from beam search scores 🤗Transformers	0	402	June 21, 2022
GPT-2 shift logits and labels 🤗Transformers	5	5871	May 12, 2023

Encoder Decoder Loss

Related topics