RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

lcoandrade · July 24, 2023, 7:32pm

Hi there!
I’m working on NLP text sequence classification notebook on Kaggle.

The database I’m using has headlines and a label (0 or 1) that points if the headline is sarcastic or not.

I’m trying to train with Lightning, but I’m facing a problem. Every time I try to train, I get the following error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Can someone help to identify what I need to change?

I’m lost here. The tensors getting out of the model should have the grad function set.

Thanks in advance.

lcoandrade · July 24, 2023, 8:58pm

Running the following code directly works. The grad function is set:

from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained(BERT_MODEL_NAME, return_dict=True, num_labels=1)
model(torch.ones(2,34).long(), labels = torch.ones(2,1))

I get:

SequenceClassifierOutput(loss=tensor(1.5051, grad_fn=<MseLossBackward0>), logits=tensor([[-0.2268],
        [-0.2268]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

But when I run that from my Lightning Module I get the RuntimeError mentioned above:

class SarcasmTagger(pl.LightningModule):

    def __init__(
        self, 
        model_name: str, 
        n_classes: int, 
        n_training_steps=None, 
        n_warmup_steps=None
    ):
        super().__init__()
        
        self.save_hyperparameters()
        
        self.bert = BertForSequenceClassification.from_pretrained(model_name, return_dict=True, num_labels=n_classes)
        self.n_training_steps = n_training_steps
        self.n_warmup_steps = n_warmup_steps

    def forward(self, input_ids, attention_mask, labels):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        return outputs
    
    def shared_step(self, batch, batch_idx):
        input_ids = batch["input_ids"]
        attention_mask = batch["attention_mask"]
        labels = batch["label"]
        outputs = self(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        return outputs, loss, labels

    def training_step(self, batch, batch_idx):
        outputs, loss, labels = self.shared_step(batch, batch_idx)
        self.log("train_loss", loss, prog_bar=True, logger=True)
        return {"loss": loss, "predictions": outputs, "labels": labels}

    def validation_step(self, batch, batch_idx):
        outputs, loss, label = self.shared_step(batch, batch_idx)
        self.log("val_loss", loss, prog_bar=True, logger=True)
        return loss

    def test_step(self, batch, batch_idx):
        outputs, loss, label = self.shared_step(batch, batch_idx)
        self.log("test_loss", loss, prog_bar=True, logger=True)
        return loss

    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=2e-5)

        scheduler = get_linear_schedule_with_warmup(
          optimizer,
          num_warmup_steps=self.n_warmup_steps,
          num_training_steps=self.n_training_steps
        )

        return dict(
            optimizer=optimizer,
            lr_scheduler=dict(
                scheduler=scheduler,
                interval='step')
        )

My tensors come like this:

SequenceClassifierOutput(loss=tensor(0.6889, device='cuda:0'), logits=tensor([[-0.1969],
        [-0.5344],
        [-0.2181],
        [-0.2516],
        [-0.3895],
        [-0.4390],
        [-0.4549],
        [-0.3304],
        [-0.4036],
        [-0.3530],
        [-0.3621],
        [-0.3212]], device='cuda:0'), hidden_states=None, attentions=None)

No grad function set. What can be the source of the problem?

lcoandrade · July 25, 2023, 4:23pm

I found the solution!!!

I tried to change the versions according to what I’ve read here:
Therefore, I changed my install packages part to:

!pip install torch==2.0.0+cu117
!pip install pytorch-lightning==1.9.4
!pip install accelerate==0.21.0
!pip install tokenizers==0.13.3
!pip install transformers==4.26.1

But the error was still popping up. So, I thought the error could be related to the optimizer used. My optimizer was this one:

def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=2e-5)

        scheduler = get_linear_schedule_with_warmup(
          optimizer,
          num_warmup_steps=self.n_warmup_steps,
          num_training_steps=self.n_training_steps
        )

        return dict(
            optimizer=optimizer,
            lr_scheduler=dict(
                scheduler=scheduler,
                interval='step')
        )

When I changed my method to use a simple Adam optimizer:

def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=2e-5)
        return [optimizer]

It worked!

So, the problem is in the AdamW with a scheduler. Reversing the install packages to just:

!pip install -q transformers

Makes the training work.

As the AdamW is deprecated, I think it is a good idea change the code to use the torch.optim.Adam/AdamW for instance.

But, anyway, this is a bug in the transformers.AdamW.

abbasm2 · July 29, 2024, 3:46pm

I am getting the same error but this solution does not work for me.

Topic		Replies	Views
How to fix RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn Intermediate	1	1726	June 10, 2024
Loss.backward() problems with require_grad Beginners	1	3947	August 27, 2020
When using SGD: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 🤗Transformers	0	1903	October 9, 2023
RuntimeError: grad can be implicitly created only for scalar outputs 🤗Transformers	0	1053	August 10, 2023
[RuntimeError] DPOTrainer - "element 0 of tensors does not require grad and does not have a grad_fn" on 8x A100 GPUs 🤗Accelerate	1	35	May 20, 2025

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Related topics