Seq2Seq Loss computation in Trainer

That’s actually a mistake in the documentation, it should be “by shifting the labels” instead of “by shifting the input_ids”. Can you open a PR to fix this?

Sure, I will :wink:

Seems like the implementation is correct :slight_smile:

Yes, now everything makes sense, thank you!