T5 user defined loss function

mengyahu · August 3, 2020, 12:20am

I am fine-tuning T5 for paraphrase generation and want to add a diversity measure for the generated sentences in the loss function. After reading the source code, I still have no clue how to add that.

mengyahu · August 3, 2020, 12:26am

I know I can generate multiple sentences using:
outs = model.generate(input_ids=batch[‘source_ids’].cuda(),
attention_mask=batch[‘source_mask’].cuda(),
max_length=maxlen,do_sample=True, top_k=120,
top_p=0.99,
early_stopping=True,
num_return_sequences=num_return_seq),
and I know how to calculate my metrics based on this ‘outs’.
However, I don’t know how to find this outputs in the return of ‘forward’ function for ’ T5ForConditionalGeneration’.

Also, I couldn’t find the definition for this ‘generate’ function.

valhalla · August 3, 2020, 5:56am

HI @mengyahu, T5ForConditionalGeneration won’t return the generations, you need to call generate yourself to get the outs. When you pass labels it calculates the standard cross-entropy loss, here

generate is defined here

mengyahu · August 3, 2020, 3:26pm

Thanks, @valhalla Suraj! This is very helpful!

Could you help me understand the difference of ‘forward’ and ‘_step’ in your example code:

def forward(
      self, input_ids, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, lm_labels=None
  ):
    return self.model(
        input_ids,
        attention_mask=attention_mask,
        decoder_input_ids=decoder_input_ids,
        decoder_attention_mask=decoder_attention_mask,
        lm_labels=lm_labels,
    )

  def _step(self, batch):
    lm_labels = batch["target_ids"]
    lm_labels[lm_labels[:, :] == self.tokenizer.pad_token_id] = -100

    outputs = self(
        input_ids=batch["source_ids"],
        attention_mask=batch["source_mask"],
        lm_labels=lm_labels,
        decoder_attention_mask=batch['target_mask']
    )

    loss = outputs[0]

    return loss

My understanding is ‘self(xxxxxx)’ in the ‘_step’ is running the ‘forward’ function defined above and the ‘self.model(xxxxxx)’ in the ‘forward’ function above is running the ‘forward’ function of T5ForConditionalGeneration.from_pretrained(hparams.model_name_or_path).

so to define my own loss function, I need to define it in the ‘_step’ like:

def _step(self, batch):
        labels = batch["target_ids"]
        labels[labels[:, :] == self.tokenizer.pad_token_id] = -100

        outputs = self(
            input_ids=batch["source_ids"],
            attention_mask=batch["source_mask"],
            labels=labels, 
            decoder_attention_mask=batch['target_mask']
        ) 

        loss1 = outputs[0]

       beam_outputs = self.generate( xxxxxx )
       loss2 = my_metrics(beam_outputs)
       loss = loss1+loss2            
       return loss

Here I use self.generate( xxxxxx ) rather than self.model.generate(xx) because self.model is that pretrained model in the input, right?

Thanks!!!

valhalla · August 3, 2020, 4:41pm

Yes, you are right about the _step method. And .generate can’t be called on self because here self is an instance of the LightningModule .

chrisdoyleIE · August 6, 2020, 4:56pm

Hi,

Just a tip to save you some hassle in the event that you did not already know what I’m about to say.

You’re going to hit a snag in your idea here if you try to pass gradients from this new loss, but of course it is fine for a logging metric.

Gradients cannot flow through a sampling method such as arg max, beam search, or nucleus sampling because the function is non-differentiable. If you train your model with this loss, it will have no bearing on your results.

loss = diversity_loss + lm_loss
loss.backward() # gradients for diversity_loss will all be zero, but your model will still train, so be careful, it is not impacting your training whatsoever!

mengyahu · August 6, 2020, 8:27pm

Thanks for the reminder, @chrisdoyleIE ! This is surprising to me! Could you provide me with a solution if I want diversity_loss to have influence on my model?

chrisdoyleIE · August 7, 2020, 8:21am

You need a differentiable model to do the sampling for you

Let V be the set of words in the vocabulary. Some models define a reinforcement learning model with a state space vector x with dimension |V|, such that x_i can be any integer in V, and a discreet action space of all integers in V.

Someone linked a paper from salesforce which follows this general idea but adds a few useful bells and whistles.

mengyahu · August 7, 2020, 11:48am

Thanks! I linked the paper. They just defined the loss on the sampling, but did not provide code, so I am not sure how they did it.

I do not know what is ‘a differentiable model to do the sampling’. Could you give me more details on this?

chrisdoyleIE · August 7, 2020, 2:55pm

It’s a whole field within itself and difficult to describe in a paragraph, but I’ll try to point you in the right direction.

Check out reinforcement learning first, then read that salesforce paper with newfound vigour! The way to make sampling differentiable is to train a function to do this job, such that the input is your probability distribution, and the output is some index in the range [0, V].

Beyond this explanation, I’m afraid I can’t offer too much help. Check out some of the papers with reinforcement learning in them here

mengyahu · August 7, 2020, 2:59pm

Thanks! This guidance is very helpful!

peggy · September 23, 2020, 4:38pm

Hi. Thanks for your explanation.
Do you have a paper, or some key-words to search on the thing you said about training a function to do the sampling?

Topic		Replies	Views
TFT5ForConditionalGeneration with custom loss Beginners	0	459	April 4, 2022
Question regarding T5ForConditionalGeneraton loss in the example Beginners	0	329	January 4, 2021
Cross Entropy Loss and loss of HuggingFace T5ForConditionalGeneration does not matches 🤗Transformers	11	5325	November 29, 2023
T5forConditionalGeneration Beginners	2	2309	September 15, 2020
T5 forward pass versus generate, latter outputs non-sense Beginners	8	2935	March 25, 2021

T5 user defined loss function

Related topics