Encoder Decoder Loss

Hi @sachin

the EncoderDecoder model calculates the standard auto-regressive cross-entropy loss using the labels i.e the output sequence. It just shifts the labels inside the models before computing the loss.

It’s the same loss used in other seq2seq models like BART, T5, and decoder models like GPT2.

Hope this helps.