Hi @sachin
the EncoderDecoder
model calculates the standard auto-regressive cross-entropy loss using the labels
i.e the output sequence. It just shifts the labels
inside the models before computing the loss.
It’s the same loss used in other seq2seq models like BART, T5, and decoder models like GPT2.
Hope this helps.