hello,
i am fine tuning T5 for summarization on the news summary dataset.
i am encountering a weird problem when calculating the loss of T5 while training. i get different results when i train computing the loss passing only the ‘labels’ parameter and when i pass both ‘labels’ and ‘decoder_input_ids’.
i find the documentation a bit misleading. in the ‘Training’ section, it says
the forward function automatically creates the correct decoder_input_ids
hereafter you can see the different loss plots i get:
the green line (copper-rain) is when i do not pass any ‘decoder_input_ids’, i.e. in the dataset class
def __getitem__(self, idx):
item = {k: v[idx] for k, v in self.encodings.items()}
item['labels'] = self.labels[idx]
the blue line (clean-cosmos) is when i do pass the ‘decoder_input_ids’ along with the ‘labels’ (following the example in this notebook), i.e.
def __getitem__(self, idx):
item = {k: v[idx] for k, v in self.encodings.items()}
item['labels'] = self.labels[idx]
item['decoder_input_ids'] = item['labels']
do you kindly have any explanations for this behaviour?