T5 user defined loss function

mengyahu · August 3, 2020, 3:26pm

Thanks, @valhalla Suraj! This is very helpful!

Could you help me understand the difference of ‘forward’ and ‘_step’ in your example code:

def forward(
      self, input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, lm_labels=None
  ):
    return self.model(
        input_ids,
        attention_mask=attention_mask,
        decoder_input_ids=decoder_input_ids,
        decoder_attention_mask=decoder_attention_mask,
        lm_labels=lm_labels,
    )

  def _step(self, batch):
    lm_labels = batch["target_ids"]
    lm_labels[lm_labels[:, :] == self.tokenizer.pad_token_id] = -100

    outputs = self(
        input_ids=batch["source_ids"],
        attention_mask=batch["source_mask"],
        lm_labels=lm_labels,
        decoder_attention_mask=batch['target_mask']
    )

    loss = outputs[0]

    return loss

My understanding is ‘self(xxxxxx)’ in the ‘_step’ is running the ‘forward’ function defined above and the ‘self.model(xxxxxx)’ in the ‘forward’ function above is running the ‘forward’ function of T5ForConditionalGeneration.from_pretrained(hparams.model_name_or_path).

so to define my own loss function, I need to define it in the ‘_step’ like:

def _step(self, batch):
        labels = batch["target_ids"]
        labels[labels[:, :] == self.tokenizer.pad_token_id] = -100

        outputs = self(
            input_ids=batch["source_ids"],
            attention_mask=batch["source_mask"],
            labels=labels, 
            decoder_attention_mask=batch['target_mask']
        ) 

        loss1 = outputs[0]

       beam_outputs = self.generate( xxxxxx )
       loss2 = my_metrics(beam_outputs)
       loss = loss1+loss2            
       return loss

Here I use self.generate( xxxxxx ) rather than self.model.generate(xx) because self.model is that pretrained model in the input, right?

Thanks!!!

Topic		Replies	Views
The output of T5 is not consistent on multiple sequences 🤗Transformers	1	867	May 11, 2022
What is loss function for T5 Models	13	12885	February 25, 2024
T5 Model Generate and Model Outputs Vastly Different Beginners	1	815	September 11, 2022
How to get the logits for the T5 model when using the `generate` method for inference? Beginners	3	5153	April 18, 2023
Loss in a Seq2Seq task 🤗Transformers	0	156	June 5, 2024

T5 user defined loss function

Related topics