EncoderDeocoderModel with different checkpoint training

Davidai · January 24, 2023, 11:41am

I am trying to train an EncoderDecoderModel using as Encoder “roberta-base” and as decoder “gpt2” on the squad dataset.
I have preprocessed the dataset to obtain input_ids and attention_mask tokenized by the tokenizer of roberta and the labels tokenized by the tokenizer of gpt2, padded with -100.

The following is my training loop.

import torch
import numpy as np
from tqdm.notebook import tqdm

model.to(device)

epochs = 3

optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)

for epoch in range(epochs):  # loop over the dataset multiple times
   print(f"------------ EPOCH:{epoch+1} ------------")
   # train + evaluate on training data
   val_f1 = 0.0
   losses = []
   k=0
   for i,batch in enumerate(tqdm(train_dataloader)):
      model.train()
      # get the inputs; 
      input_ids = batch["input_ids"].to(device)
      attention_mask = batch["attention_mask"].to(device)
      labels = batch["labels"].to(device)
      #decoder_attention_mask = batch["decoder_attention_mask"].to(device)

      # zero the parameter gradients
      optimizer.zero_grad()

      # forward + backward + optimize
      
      try:
            outputs = model(input_ids=input_ids, 
                            attention_mask=attention_mask, 
                            labels=labels) 
                            #decoder_attention_mask = decoder_attention_mask)
      except:
            k+=1
            continue
      loss = outputs.loss
      losses.append(loss.item())
      #if i % 50 == 0:
      print("\rLoss:", np.mean(losses), end='')
      loss.backward()
      optimizer.step()

   # evaluate (batch generation)
   model.eval()
   print('\nEVALUATING...')
   val_f1 = []
   for eval_batch in tqdm(val_dataloader):
       outputs = model.generate(eval_batch["input_ids"].to(device))
       # compute metrics
       metrics = compute_metrics(pred_ids=outputs, labels_ids=eval_batch["labels"])
       val_f1.append(metrics)
  
   print("\nVal F1:", np.mean(val_f1), "\nN Fails:", k)

It seems to work and the loss goes down for a few batches until it returns the following error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[42], line 20
     18 model.train()
     19 # get the inputs; 
---> 20 input_ids = batch["input_ids"].to(device)
     21 attention_mask = batch["attention_mask"].to(device)
     22 labels = batch["labels"].to(device)

RuntimeError: CUDA error: device-side assert triggered

I had a look around and it seems a problem of 1. Inconsistency between the number of labels/classes and the number of output units or the input of the loss function may be incorrect, but it doesn’t make sense to me as it seems work for a while, and gives me the error when calling input_ids = batch["input_ids"].to(device).

Has someone encountered the same issue??

Topic		Replies	Views
EncoderDecoderModel loaded from pre-trained checkpoints fails when calling generate 🤗Transformers	5	605	June 20, 2024
The correct way to load an EncoderDecoderModel from pre-trained encoder and decoder checkpoints Beginners	0	497	August 16, 2021
EncoderDecoderModel with Longformer and Bert 🤗Transformers	1	622	February 11, 2021
From Transformers Version v4.12.0 onwards, The example colab BERT2BERT is wrong. (Things to keep in mind when using from transformers import EncoderDecoderModel) 🤗Transformers	0	270	February 16, 2024
I get a "You have to specify either input_ids or inputs_embeds" error, but I do specify the input ids Beginners	6	21089	October 31, 2021

EncoderDeocoderModel with different checkpoint training

Related topics