Pytorch BERT model not converging

apugantsov · February 25, 2021, 5:43pm

I’m currently working in research in transfer learning, and I’m trying to use bert-base-cased as a pretrained baseline, wrapped in a Pytorch model with dropout and a linear layer. I largely follow the recommendations of BERT and train using AdamW optimizer and a scheduler.

My issue is, I can train on one task no problem, with the BERT recommended parameters of LR=2e-5, batch size=32, epochs=2. I use a cross entropy loss with class weighting to address a label imbalance issue.

My issue is when I save this model, then reload the state as a base for further downstream training on a different classification task, it never finishes a single epoch. What’s also strange is I get 100% GPU utilisation, which doesn’t change (not the same on the baseline models).

def train(self,
          optimizer,
          scheduler,
          minibatches: torch.utils.data.DataLoader) -> Dict:
    self.model = self.model.train()

    metrics = {"n_correct": 0,
               "losses": []}

    for batch in minibatches:
        texts, targets = batch
        targets = targets.to(self.device)
        encoded_input = BERTPreprocessor.encode(texts, self.tokenizer)

        for tensor in encoded_input:
            encoded_input[tensor] = (encoded_input[tensor]
                                     .to(self.device))

        logits, *_ = self.model(**{
            "input_ids": encoded_input["input_ids"],
            "attention_mask": encoded_input["attention_mask"],
        })

        loss = self.loss_fn(logits, targets)

        _, preds = torch.max(logits, dim=1)
        metrics["n_correct"] += torch.sum(preds == targets)
        metrics["losses"].append(loss.item())

        loss.backward()
        torch.nn.utils.clip_grad_norm_(self.model.parameters(),
                                       max_norm=1.0)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
    return metrics

I’ve attached my training process above and I’m happy to provide any further information, but as I’m fairly new to the field, I’m at a loss with how to address this.

To add, the weighting scheme I use for the weight parameter in the loss function is num_minority_class/class_n for the negative and positive classes (binary classification).

rgwatwormhill · February 26, 2021, 11:42am

Hi,

[I am not an expert, but I have saved and reloaded Bert models].

What commands are you using to save and reload your model? Are you saving just the BERT weights, or your custom dropout and linear layers too?

Do you get any error messages on reloading? Do you get any error messages when you try to run your second classification task (or does it just go on in an infinite loop)?

Have you tried running your second task immediately after your first task (without doing a save and reload)? Does it work?

[Silly questions] How do you know your first task has “worked”? Are you using BertModel, and have you considered using BertForSequenceClassification? When you train, are you updating the BERT weights, or have you frozen them?

Topic		Replies	Views
Loading pytorch_pretrained_bert models with transformers Beginners	2	1918	April 29, 2021
Can not make transfer learning Beginners	0	534	January 19, 2021
Issue with converting my own BERT TF2 checkpoint to PyTorch and loading the PyTorch model for training 🤗Transformers	0	542	February 25, 2021
Converting TF-Bert to Torch using conversion script works, but Beginners	4	772	July 23, 2021
Loading model from pytorch_pretrained_bert into transformers library 🤗Transformers	2	8095	January 5, 2022

Pytorch BERT model not converging

Related topics