“element 0 of tensors does not require grad and does not have a grad_fn”
Is it likely to be a problem with the model, the loss function, or the shape of the data tensors?
I have a dataset of texts, each with an associated real value. I want to fine-tune a bert model to these, and then visualize the attention weights (for any specific text) and how they are altered by the fine-tuning. I have defined a model based on transformers BertModel, then passing the pooled_output (=CLS token) through two more dense layers, the first with ReLU and the second with Sigmoid.
I’ve been following abhimishra https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb and ChrisMcCormick https://mccormickml.com/2019/07/22/BERT-fine-tuning/#4-train-our-classification-model , but I’m stuck with the .backward() step.
I’ve tried calculating the loss as part of the forward pass (within the model class definition) or outside of that, but I get the same error: “element 0 of tensors does not require grad and does not have a grad_fn”.
Model
class ATBertClass(torch.nn.Module):
def __init__(self):
super(ATBertClass, self).__init__()
self.L1bb = transformers.BertModel.from_pretrained('bert-base-uncased', output_attentions=True)
self.L2Lin = torch.nn.Linear(768,64)
self.L3Rel = torch.nn.ReLU()
self.L4Lin = torch.nn.Linear(64,1)
self.L5Sig = torch.nn.Sigmoid()
def forward(self, input_ids, attention_mask, labels):
_, output_1, attns = self.L1bb(input_ids = input_ids,
attention_mask = attention_mask)
output_2 = self.L2Lin(output_1)
output_3 = self.L3Rel(output_2)
output_4 = self.L4Lin(output_3)
output_5 = self.L5Sig(output_4)
return output_5, attns
Is there anything obviously wrong with the model definition?
(I’m not sure about the torch.nn.ReLU and Sigmoid layers).
Can anyone advise?