A potential in-place operation that caused an RuntimeError

Hi, I’m using transformers to generate sentences containing gradients using model.generate with the modification of removing @torch.no_grad() ahead of def generate(...):, since the current version (4.2.1) of model.generate doesn’t support keeping gradient.

And because I set do_sample=True and num_beams>1 in generate, the return type is BeamSampleEncoderDecoderOutput. According to the documents, the scores of BeamSampleEncoderDecoderOutput consists of log softmax scores for each vocabulary token and the sum of log softmax of previously generated tokens in this beam.

What I want to do is gathering the non-inf value from the last beam and applying gradient descent to train the network later. The key pseudo-code is:

outputs = self.generate(input_ids, ..., **model_kwargs)
# The type of outputs is BeamSampleEncoderDecoderOutput
scores = outputs.scores
last_step_score = scores[-1]
last_step_score = last_step_score[torch.where(last_step_score != -float('inf'))]
last_step_score = last_step_score[::num_beams]

However, when I run the program, I receive an error:

RuntimeError:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16, 50265]], which is output 0 of LogSoftmaxBackward, is at version 17; expected version 0 instead.
Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

which means that there is an in-place operation in generate, and I guess the in-place operation lies in BeamSearchScorer.finalize, but I can’t figure out what source code to change to make it viable.

1 Like

cc @patrickvonplaten

1 Like

I’m seeing a similar error when fine tuning led-large-16384-arxiv on a custom dataset. It goes 2006 steps in before saying

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [16, 8192, 1]], which is output 0 of ViewBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Setting torch.autograd.set_detect_anomaly(True) gives me the following stack trace I’ve truncated to the last, most recent call

File "C:\Users\ThomasWood\source\repos\LifeBio.Memory.AI\seq2seq\fine_tune_snapshots.py", line 166, in <module>           trainer.train(resume_from_checkpoint='checkpoint-2000')                                                               
File "C:\Users\ThomasWood\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\trainer.py", line 1269, in train                                                 tr_loss += self.training_step(model, inputs)                                                                          
File "C:\Users\ThomasWood\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\trainer.py", line 1764, in training_step                                         self.scaler.scale(loss).backward()                                                                                    
File "C:\Users\ThomasWood\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\_tensor.py", line 255, in backward                                                      torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)                                    
File "C:\Users\ThomasWood\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\autograd\__init__.py", line 147, in backward                                            Variable._execution_engine.run_backward(                                                                              
File "C:\Users\ThomasWood\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\autograd\function.py", line 87, in apply                                                return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]                                          
File "C:\Users\ThomasWood\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\checkpoint.py", line 138, in backward                                             torch.autograd.backward(outputs_with_grad, args_with_grad)                                                            
File "C:\Users\ThomasWood\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\autograd\__init__.py", line 147, in backward                                            Variable._execution_engine.run_backward(                                                                            
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [16, 8192, 1]], which is output 0 of ViewBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!                                                                                     

Only changes I’ve made to this notebook are changing out my dataset for the arxiv dataset.