I am trying to train an EncoderDecoderModel using as Encoder “roberta-base” and as decoder “gpt2” on the squad dataset.
I have preprocessed the dataset to obtain input_ids and attention_mask tokenized by the tokenizer of roberta and the labels tokenized by the tokenizer of gpt2, padded with -100.
The following is my training loop.
import torch
import numpy as np
from tqdm.notebook import tqdm
model.to(device)
epochs = 3
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
for epoch in range(epochs): # loop over the dataset multiple times
print(f"------------ EPOCH:{epoch+1} ------------")
# train + evaluate on training data
val_f1 = 0.0
losses = []
k=0
for i,batch in enumerate(tqdm(train_dataloader)):
model.train()
# get the inputs;
input_ids = batch["input_ids"].to(device)
attention_mask = batch["attention_mask"].to(device)
labels = batch["labels"].to(device)
#decoder_attention_mask = batch["decoder_attention_mask"].to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
try:
outputs = model(input_ids=input_ids,
attention_mask=attention_mask,
labels=labels)
#decoder_attention_mask = decoder_attention_mask)
except:
k+=1
continue
loss = outputs.loss
losses.append(loss.item())
#if i % 50 == 0:
print("\rLoss:", np.mean(losses), end='')
loss.backward()
optimizer.step()
# evaluate (batch generation)
model.eval()
print('\nEVALUATING...')
val_f1 = []
for eval_batch in tqdm(val_dataloader):
outputs = model.generate(eval_batch["input_ids"].to(device))
# compute metrics
metrics = compute_metrics(pred_ids=outputs, labels_ids=eval_batch["labels"])
val_f1.append(metrics)
print("\nVal F1:", np.mean(val_f1), "\nN Fails:", k)
It seems to work and the loss goes down for a few batches until it returns the following error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[42], line 20
18 model.train()
19 # get the inputs;
---> 20 input_ids = batch["input_ids"].to(device)
21 attention_mask = batch["attention_mask"].to(device)
22 labels = batch["labels"].to(device)
RuntimeError: CUDA error: device-side assert triggered
I had a look around and it seems a problem of 1. Inconsistency between the number of labels/classes and the number of output units or the input of the loss function may be incorrect, but it doesn’t make sense to me as it seems work for a while, and gives me the error when calling input_ids = batch["input_ids"].to(device)
.
Has someone encountered the same issue??