Carrying Gradients Through Generate

chrisdoyleIE · July 15, 2020, 11:45am

Hi folks,

How would you best recommend that I pass gradients through generate? below is a rough code snippet explaining the objective.

I am thinking that I could take the hypo_ids directly from the model output (instead of from generate), but this is no longer natural because teacher-forcing is used to generate these.

Thoughts?

Context from Pytorch Lightning Implementation:


# self.model = BartForConditionalGeneration("facebook/bart-base")

def forward(self, batch, batch_id):
    return self.model(input_ids = batch["x"], decoder_inputs=["decoder_inputs"], decoder_labels = ["decoder_labels"] )

def training_step(self, batch, batch_id)
   """Want two losses, language modelling loss and semantic similarity loss"""
    
    # language modelling loss
    outputs = self(batch)[0]
    language_modelling_loss = outputs[0]
    
    # semantic similarity loss
    target_ids = batch["target_ids"]
    hypo_ids = self.model.generate(batch["x"]) # no gradients passed of course
    semsim_loss = 1 - nn.CosineSimilarity(dim=0)(target_ids, hypo_ids)

   return {"loss": language_modelling_loss + semsim_loss}

chrisdoyleIE · July 16, 2020, 11:16am

EDIT: The only method seems to be to use RL to simulate the sampling that occurs.

see https://papers.nips.cc/paper/8682-training-language-gans-from-scratch.pdf

sshleifer · July 16, 2020, 5:14pm

@yjernite is also interested in this line of work.
I would write a method similar to parlai’s decode_forced

that forces the model to decode the tgt sequence and estimates its probability, then backprob the sum of the GT sequence. I’m not sure if that will lead to super similar results to the current teacher-forcing training approach, but it would be interesting to test!

chrisdoyleIE · August 6, 2020, 5:06pm

I just tried a simple ffnn to replicate argmax, but found that the gradients are almost always zero which makes sense I guess - changing other vector values will almost never change the maximum value.

patrickvonplaten · November 2, 2020, 12:55pm

This should also be interesting: Big `generate()` refactor

eduardoprea44 · January 29, 2023, 10:57pm

Hello,
I’m trying to do something similar. Did you manage to implement something working?

Topic		Replies	Views
How to output loss from model.generate()? 🤗Transformers	16	5970	January 7, 2025
Passing output of BART to another model Beginners	0	232	September 24, 2022
BartForConditionalGeneration "logits" shape is wrong/unexpected 🤗Transformers	4	918	November 11, 2020
[Bart] Question for BartModel Output shape Beginners	2	375	July 20, 2020
T5 Model Generate and Model Outputs Vastly Different Beginners	1	813	September 11, 2022

Carrying Gradients Through Generate

Related topics