T5 fine tuning, loss difference when using labels and decoder_input_ids

hello,

i am fine tuning T5 for summarization on the news summary dataset.

i am encountering a weird problem when calculating the loss of T5 while training. i get different results when i train computing the loss passing only the ‘labels’ parameter and when i pass both ‘labels’ and ‘decoder_input_ids’.
i find the documentation a bit misleading. in the ‘Training’ section, it says

the forward function automatically creates the correct decoder_input_ids

hereafter you can see the different loss plots i get:

the green line (copper-rain) is when i do not pass any ‘decoder_input_ids’, i.e. in the dataset class

def __getitem__(self, idx):
        item = {k: v[idx] for k, v in self.encodings.items()}
        item['labels'] = self.labels[idx]

the blue line (clean-cosmos) is when i do pass the ‘decoder_input_ids’ along with the ‘labels’ (following the example in this notebook), i.e.

def __getitem__(self, idx):
        item = {k: v[idx] for k, v in self.encodings.items()}
        item['labels'] = self.labels[idx]
        item['decoder_input_ids'] = item['labels']

do you kindly have any explanations for this behaviour?

How are you preparing labels ?
decoder_input_ids are shifted to the right and start with pad token.
This is how you should prepare decoder_input_ids for T5

decoder_input_ids = labels.new_zeros(labels.shape)
decoder_input_ids[..., 1:] = labels[..., :-1].clone()
decoder_input_ids[..., 0] = self.pad_token_id

Consider using examples/seq2seq here, for seq2seq experiments, it does lot’s of things for you including this

thank you for your help!

i see. indeed, now i get exactly the same results as by passing ‘labels’ only (since i set the same seed). i was biased by the fact that i was getting a much lower loss passing the labels to the decode_input_ids as is. do you have any clue why i get this behaviour?

i wonder decode_input_ids is not useful at all when passing labels to the trained model. i don’t really understand why in the notebook i linked they give it as a parameter.