How to output loss from model.generate()?

I need the probability distribution of word generation to calculate the loss in my original loss function.
In particular, the model differs from a normal loss function in that it generates sentences in the same way as when testing.

Therefore, I have tried to calculate the loss using model.generate(), but this method does not leave me with the calculated graph needed to calculate the gradient. Could this be easily solved by passing a special argument to the method? Or is there an equally simple solution? Or do I have to implement my own function that generates the text in a way that leaves the computed graph?

I can’t answer many of your questions, but I did find this code snippet useful to get a computational graph with generate():

from undecorated import undecorated
from types import MethodType

generate_with_grad = undecorated(model.generate)
model.generate_with_grad = MethodType(generate_with_grad, model)

The generate() function has a no_grad decorator that stops the computational graph being returned, and this code just removes the decorator and leaves the rest of the generate function unchanged.

1 Like

Thanks for sharing your simple solution! :heart_eyes:

I’ll give this method a try!

thanks! tomroth1001!
I got scores with computational graph!

1 Like

Hello, I tried this method to retain the computational graph and it works. However when I try to backpropagate some loss computed from the generation scores I get an error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Which is triggered somewhere in generation_logits_process.py.

Last few lines of the trace:

next_token_scores_processed = logits_processor(
File “/home/halamvac/venvs/venv39/lib/python3.9/site-packages/transformers/generation_logits_process.py”, line 92, in call
scores = processor(input_ids, scores)
File “/home/halamvac/venvs/venv39/lib/python3.9/site-packages/transformers/generation_logits_process.py”, line 161, in call
score = torch.gather(scores, 1, input_ids)

@mittu @tomroth1001 Did any of you make it work with backpropagating the gradient?

Not sure, sorry. It worked for my case.
The cliché advice is to make sure all your packages are up to date and then try again—might be a bug.

Thanks for the tip but I already have the latest versions. Might I ask what versions of pytorch and transformers are you using?

I’m using transformers 4.19.2 and torch 1.11.0+cu113, and it still work with out error
And my minimum code is here.

from transformers import BartForConditionalGeneration
from undecorated import undecorated
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
generate_with_grad = undecorated(model.generate)
model.generate_with_grad = MethodType(generate_with_grad, model)

output=model.generate_with_grad(
  input_ids=input_ids,
  output_scores=True,return_dict_in_generate=True,
  output_hidden_states = True,
  )

According to error message, there is an in-place operation at somewhere, so It would be a good idea to do a backward look, commenting out each calculation process line by line from the end to the beginning, to find which calculation process has the problem. Note backward() is possible only for scalars, so for multidimensional tensors, it is better to sum or average as appropriate.