Gradients of BERT layer outputs to inputs

NitinTitus · December 7, 2020, 4:43pm

I am trying to find the gradient of the output of a layer of BERT to its inputs, token wise. But I keep getting the error saying: ‘RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.’ Below is the code snippet:

for count, data in enumerate(iter(data_loader)):

input_ids=torch.squeeze(data[‘input_ids’],dim=0)

attention_mask=torch.squeeze(data[‘attention_mask’],dim=0)

last_hidden_state, pooled_output, hidden_states = bert_model(input_ids=input_ids,attention_mask=attention_mask)

bert_layer_i_output=hidden_states[i][0]

print(bert_layer_i_output.shape)

bert_layer_j_output=hidden_states[j][0]

#print(torch.autograd.grad(bert_layer_j_output,bert_layer_i_output,retain_graph=True, create_graph=True))

for k in range(bert_layer_i_output.shape[0]):
gradient=torch.autograd.grad(bert_layer_j_output[k],bert_layer_i_output[k],grad_outputs=torch.ones_like(bert_layer_j_output[k]))
print(gradient.shape)
print(torch.norm(gradient))
break
break

Below is the stack trace of the error:

/usr/local/lib/python3.6/dist-packages/torch/autograd/ init .py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
202 return Variable. execution_engine.run_backward(
203 outputs, grad_outputs , retain_graph, create_graph,
→ 204 inputs, allow_unused)
205
206

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Am i doing something wrong? Ideally both the tensors should be part of the same computational graph right?

Topic		Replies	Views
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [1, 128]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed t 🤗Transformers	1	1735	August 16, 2024
MaskedLMOutput does not have last_hidden_state 🤗Transformers	0	1627	May 27, 2021
One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior Intermediate	0	1570	February 21, 2021
BertForPretraining hidden_states extraction with input embeddings as inputs Models	0	397	June 4, 2022
Model() output issue during migration from pytorch_pretrained_bert to transformers 🤗Transformers	0	546	September 15, 2020

Gradients of BERT layer outputs to inputs

Related topics