Hi, I’m using transformers to generate sentences containing gradients using model.generate
with the modification of removing @torch.no_grad()
ahead of def generate(...):
, since the current version (4.2.1) of model.generate
doesn’t support keeping gradient.
And because I set do_sample=True
and num_beams>1
in generate
, the return type is BeamSampleEncoderDecoderOutput
. According to the documents, the scores
of BeamSampleEncoderDecoderOutput
consists of log softmax scores for each vocabulary token and the sum of log softmax of previously generated tokens in this beam.
What I want to do is gathering the non-inf value from the last beam and applying gradient descent to train the network later. The key pseudo-code is:
outputs = self.generate(input_ids, ..., **model_kwargs)
# The type of outputs is BeamSampleEncoderDecoderOutput
scores = outputs.scores
last_step_score = scores[-1]
last_step_score = last_step_score[torch.where(last_step_score != -float('inf'))]
last_step_score = last_step_score[::num_beams]
However, when I run the program, I receive an error:
RuntimeError:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16, 50265]], which is output 0 of LogSoftmaxBackward, is at version 17; expected version 0 instead.
Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
which means that there is an in-place operation in generate
, and I guess the in-place operation lies in BeamSearchScorer.finalize
, but I can’t figure out what source code to change to make it viable.