Thanks, that’s a clear and succinct explanation!
But I guess my question can still stand regarding decoder_input_ids
, in case it’s based on labels (see my other question, which would mean - if I understand correctly - that labels (shifted right) are used during computation, at decoder side, no?