Encoder Decoder Loss

sachin · October 14, 2021, 8:54am

Hey, sorry for not replying earlier. The basic reason is because when the tokenizer encodes it, it will do something like "<START> My decoded sentence <END>". The output of the decoder transformer will only predict for "My decoded sentence <END>".

So the logits predict for the tokens shifted by one (without the token). And the reason we look at logits except for last one is the one value it predicts after is non-sensical, so we simply drop it.

Topic		Replies	Views
Understanding the encoder-decoder loss calculation VS CLM loss Beginners	0	346	February 21, 2024
Seq2Seq Loss computation in Trainer Beginners	9	6022	October 28, 2021
Does the transformer automatically shift by one position when calculating the autoregressive loss during the forward pass? Beginners	1	26	March 20, 2025
Getting CrossEntropy loss from beam search scores 🤗Transformers	0	402	June 21, 2022
GPT-2 shift logits and labels 🤗Transformers	5	5857	May 12, 2023

Encoder Decoder Loss

Related topics