Thanks, @valhalla Suraj! This is very helpful!
Could you help me understand the difference of ‘forward’ and ‘_step’ in your example code:
def forward(
self, input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, lm_labels=None
):
return self.model(
input_ids,
attention_mask=attention_mask,
decoder_input_ids=decoder_input_ids,
decoder_attention_mask=decoder_attention_mask,
lm_labels=lm_labels,
)
def _step(self, batch):
lm_labels = batch["target_ids"]
lm_labels[lm_labels[:, :] == self.tokenizer.pad_token_id] = -100
outputs = self(
input_ids=batch["source_ids"],
attention_mask=batch["source_mask"],
lm_labels=lm_labels,
decoder_attention_mask=batch['target_mask']
)
loss = outputs[0]
return loss
My understanding is ‘self(xxxxxx)’ in the ‘_step’ is running the ‘forward’ function defined above and the ‘self.model(xxxxxx)’ in the ‘forward’ function above is running the ‘forward’ function of T5ForConditionalGeneration.from_pretrained(hparams.model_name_or_path).
so to define my own loss function, I need to define it in the ‘_step’ like:
def _step(self, batch):
labels = batch["target_ids"]
labels[labels[:, :] == self.tokenizer.pad_token_id] = -100
outputs = self(
input_ids=batch["source_ids"],
attention_mask=batch["source_mask"],
labels=labels,
decoder_attention_mask=batch['target_mask']
)
loss1 = outputs[0]
beam_outputs = self.generate( xxxxxx )
loss2 = my_metrics(beam_outputs)
loss = loss1+loss2
return loss
Here I use self.generate( xxxxxx ) rather than self.model.generate(xx) because self.model is that pretrained model in the input, right?
Thanks!!!