How are you preparing labels
?
decoder_input_ids
are shifted to the right and start with pad
token.
This is how you should prepare decoder_input_ids
for T5
decoder_input_ids = labels.new_zeros(labels.shape)
decoder_input_ids[..., 1:] = labels[..., :-1].clone()
decoder_input_ids[..., 0] = self.pad_token_id
Consider using examples/seq2seq
here, for seq2seq experiments, it does lot’s of things for you including this