RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Hi there!
I’m working on NLP text sequence classification notebook on Kaggle.

The database I’m using has headlines and a label (0 or 1) that points if the headline is sarcastic or not.

I’m trying to train with Lightning, but I’m facing a problem. Every time I try to train, I get the following error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Can someone help to identify what I need to change?

I’m lost here. The tensors getting out of the model should have the grad function set.

Thanks in advance.

Running the following code directly works. The grad function is set:

from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained(BERT_MODEL_NAME, return_dict=True, num_labels=1)
model(torch.ones(2,34).long(), labels = torch.ones(2,1))

I get:

SequenceClassifierOutput(loss=tensor(1.5051, grad_fn=<MseLossBackward0>), logits=tensor([[-0.2268],
        [-0.2268]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

But when I run that from my Lightning Module I get the RuntimeError mentioned above:

class SarcasmTagger(pl.LightningModule):

    def __init__(
        self, 
        model_name: str, 
        n_classes: int, 
        n_training_steps=None, 
        n_warmup_steps=None
    ):
        super().__init__()
        
        self.save_hyperparameters()
        
        self.bert = BertForSequenceClassification.from_pretrained(model_name, return_dict=True, num_labels=n_classes)
        self.n_training_steps = n_training_steps
        self.n_warmup_steps = n_warmup_steps

    def forward(self, input_ids, attention_mask, labels):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        return outputs
    
    def shared_step(self, batch, batch_idx):
        input_ids = batch["input_ids"]
        attention_mask = batch["attention_mask"]
        labels = batch["label"]
        outputs = self(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        return outputs, loss, labels

    def training_step(self, batch, batch_idx):
        outputs, loss, labels = self.shared_step(batch, batch_idx)
        self.log("train_loss", loss, prog_bar=True, logger=True)
        return {"loss": loss, "predictions": outputs, "labels": labels}

    def validation_step(self, batch, batch_idx):
        outputs, loss, label = self.shared_step(batch, batch_idx)
        self.log("val_loss", loss, prog_bar=True, logger=True)
        return loss

    def test_step(self, batch, batch_idx):
        outputs, loss, label = self.shared_step(batch, batch_idx)
        self.log("test_loss", loss, prog_bar=True, logger=True)
        return loss

    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=2e-5)

        scheduler = get_linear_schedule_with_warmup(
          optimizer,
          num_warmup_steps=self.n_warmup_steps,
          num_training_steps=self.n_training_steps
        )

        return dict(
            optimizer=optimizer,
            lr_scheduler=dict(
                scheduler=scheduler,
                interval='step')
        )

My tensors come like this:

SequenceClassifierOutput(loss=tensor(0.6889, device='cuda:0'), logits=tensor([[-0.1969],
        [-0.5344],
        [-0.2181],
        [-0.2516],
        [-0.3895],
        [-0.4390],
        [-0.4549],
        [-0.3304],
        [-0.4036],
        [-0.3530],
        [-0.3621],
        [-0.3212]], device='cuda:0'), hidden_states=None, attentions=None)

No grad function set. What can be the source of the problem?

I found the solution!!!

I tried to change the versions according to what I’ve read here:
Therefore, I changed my install packages part to:

!pip install torch==2.0.0+cu117
!pip install pytorch-lightning==1.9.4
!pip install accelerate==0.21.0
!pip install tokenizers==0.13.3
!pip install transformers==4.26.1

But the error was still popping up. So, I thought the error could be related to the optimizer used. My optimizer was this one:

def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=2e-5)

        scheduler = get_linear_schedule_with_warmup(
          optimizer,
          num_warmup_steps=self.n_warmup_steps,
          num_training_steps=self.n_training_steps
        )

        return dict(
            optimizer=optimizer,
            lr_scheduler=dict(
                scheduler=scheduler,
                interval='step')
        )

When I changed my method to use a simple Adam optimizer:

def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=2e-5)
        return [optimizer]

It worked!

So, the problem is in the AdamW with a scheduler. Reversing the install packages to just:

!pip install -q transformers

Makes the training work.

As the AdamW is deprecated, I think it is a good idea change the code to use the torch.optim.Adam/AdamW for instance.

But, anyway, this is a bug in the transformers.AdamW.

3 Likes