Hi, I’m using transformers to generate sentences containing gradients using
model.generate with the modification of removing
@torch.no_grad() ahead of
def generate(...):, since the current version (4.2.1) of
model.generate doesn’t support keeping gradient.
And because I set
generate, the return type is
BeamSampleEncoderDecoderOutput. According to the documents, the
BeamSampleEncoderDecoderOutput consists of log softmax scores for each vocabulary token and the sum of log softmax of previously generated tokens in this beam.
What I want to do is gathering the non-inf value from the last beam and applying gradient descent to train the network later. The key pseudo-code is:
outputs = self.generate(input_ids, ..., **model_kwargs) # The type of outputs is BeamSampleEncoderDecoderOutput scores = outputs.scores last_step_score = scores[-1] last_step_score = last_step_score[torch.where(last_step_score != -float('inf'))] last_step_score = last_step_score[::num_beams]
However, when I run the program, I receive an error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16, 50265]], which is output 0 of LogSoftmaxBackward, is at version 17; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
which means that there is an in-place operation in
generate, and I guess the in-place operation lies in
BeamSearchScorer.finalize, but I can’t figure out what source code to change to make it viable.