Errors when fine-tuning T5

Hi everyone,

I’m trying to fine-tune a T5 model. I followed most (all?) the tutorials, notebooks and code snippets from the Transformers library to understand what to do, but so far, I’m only getting errors. The end goal is giving T5 a task such as finding the max/min of a sequence of numbers, for example, but I’m starting with something really small, just to see if I understand how things work.

I’m using Transformers v4.2.2 (Tokenizers v0.9.4).

This is what I have understood so far and which I think is correct (excuse the French, I’m working on that too :wink: ):

tokenizer = T5TokenizerFast.from_pretrained("t5-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base")

prefix = "translate English to French:"
inputs = [f"{prefix} How are you?", f"{prefix} My name is Ben", f"{prefix} My cat is great"]
outputs = ["Comment ca va?", "Je m'appelle Ben", "Mon chat est genial"]
model_inputs = tokenizer(inputs, padding=True, truncation=True, return_tensors="pt")
with tokenizer.as_target_tokenizer():
    labels = tokenizer(outputs, padding=True, truncation=True, return_tensors="pt")
model_inputs["labels"] = labels["input_ids"]

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, examples):
        self.examples = examples
    
    def __getitem__(self, idx):
        return self.examples[idx]
    
    def __len__(self):
        return len(self.examples)

train = MyDataset([model_inputs])

training_args = Seq2SeqTrainingArguments(
    output_dir="output",
    overwrite_output_dir=True,
    per_device_train_batch_size=2,
    num_train_epochs=3,
    run_name="T5 Experiment",
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train,
    tokenizer=tokenizer
)

trainer.train()

The error I’m getting is ValueError: too many values to unpack (expected 2) which happens in the forward method in transformers/models/t5/modeling_t5.py, in the line 877 (880 on the master branch):

batch_size, seq_length = input_shape

Looking at a few lines before the error, I see input_shape is just input_ids.size(), and my model_inputs["input_ids"] is indeed a two-dimensional PyTorch tensor, so I don’t understand why the unpacking crashes.

I have no idea whether I’m doing something wrong in the model, in the tokenization, in the training, I really am lost.

Any help is greatly appreciated!

Good job posting your issue in a very clear fashion, it was very easy to reproduce and debug :slight_smile:

So the problem is that you are using model_inputs wrong. It contains one key for input_ids, labels and attention_mask and each value is a 2d tensor with the first dimension being 3 (your number of sentences). You dataset should actually dig in that dictionary in its __getitem__:

    def __getitem__(self, idx):
        return {k: v[idx] for k, v in self.examples.items()}

So that each element of your dataset is a dictionary with the tree keys pointing to one of the encoded sentences (so associated values being 1d tensors). Then you should pass the model_inputs without putting them in a list:

train = MyDataset(model_inputs)

and it should work.

The current model_inputs["input_ids"] your model gets is a 3d-tensor with your actual code, of shape 1 x 3 x n

3 Likes

Thank you so much, Sylvain!

Hi Everyone! I have the same issue with “ValueError: too many values to unpack (expected 2)” while finetuning the bert model.

I’m very new to all of that, so could you please help me find the issue?

class MyDataset(Dataset):

    def __init__(self, df, tokenizer_name=MODEL_name, max_length=1024):
        
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, do_lower_case=False)
        self.seqs, self.labels = self.load_dataset(df)
        self.max_length = max_length

    def __len__(self):
        return len(self.labels)
    def load_dataset(self,df):
        seq = list(df['sequence'])
        label = list(df['score'])
        assert len(seq) == len(label)
        return seq, label
        
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        seq = " ".join("".join(self.seqs[idx].split()))
        seq = re.sub(r"[UZOB]", "X", seq)

        seq_ids = self.tokenizer(seq, truncation=True, padding='max_length', max_length=self.max_length, return_tensors='pt').to(device)

        sample = {key: torch.tensor(val) for key, val in seq_ids.items()}
        sample['labels'] = torch.tensor(self.labels[idx])

        return sample

when I print (input_ids.size()) out of (1) batch = torch.Size([64, 1, 1024]), but I believe it expects tuple with (64,1024).

also - if I skip “return_tensors=‘pt’” and "to(device) in seq_ids = self.tokenizer(seq, truncation=True, padding=‘max_length’, max_length=self.max_length**, return_tensors=‘pt’**).to(device) i got the error “not the same device”

Could you give us the whole stack trace? Thanks.

#2d
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Device:", device)

import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F 
import torch.utils.data as DataLoader
from scipy.stats import pearsonr
import numpy as np
import torch.utils.data as data_utils
torch.set_printoptions(precision=10)

import Meter as Meter
import EarlyStopping as EarlyStopping



from transformers import AutoTokenizer, BertForSequenceClassification, BertTokenizerFast
model_name=MODEL_NAME
max_length=1024


from transformers import AutoTokenizer, AutoModelForTokenClassification, BertTokenizerFast, EvalPrediction, BertForSequenceClassification
from torch.utils.data import Dataset

class MyDataset(Dataset):

    def __init__(self, df, tokenizer_name=model_name, max_length=1024):
        
        #self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, do_lower_case=False)
        self.tokenizer = BertTokenizerFast.from_pretrained(tokenizer_name, do_lower_case=False)
        self.seqs, self.labels = self.load_dataset(df)
        self.max_length = max_length

    def __len__(self):
        return len(self.labels)
    def load_dataset(self,df):
        seq = list(df['sequence'])
        label = list(df['target_scaled'])
        assert len(seq) == len(label)
        return seq, label
        
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        seq = " ".join("".join(self.seqs[idx].split()))
        seq = re.sub(r"[UZOB]", "X", seq)

        #seq_ids = self.tokenizer(seq, truncation=True, padding='max_length', max_length=self.max_length, return_tensors='pt').to(device)
        seq_ids = self.tokenizer(seq, truncation=True, padding='max_length', max_length=self.max_length)
        sample = {key: torch.tensor(val) for key, val in seq_ids.items()}
        sample['labels'] = torch.tensor(self.labels[idx])

        return sample

train_seqs_encodings_dataset=MyDataset(df=train_dataset_clean)
valid_seqs_encodings_dataset=MyDataset(df=valid_dataset_clean)

train_loader = DataLoader.DataLoader(
            train_seqs_encodings_dataset,
            batch_size=64,
            shuffle=True
        )
valid_loader = DataLoader.DataLoader(
            valid_seqs_encodings_dataset,
            batch_size=64,
            shuffle=True
        )


torch.manual_seed(100)
learning_rate=0.001

#Initialize network
#model=AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1)
model=BertForSequenceClassification.from_pretrained(model_name, num_labels=1)
#Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate) #to check 
model.train()


metric_name='r2'
num_epochs=5
stopper = EarlyStopping.EarlyStopping(mode='higher', patience=20)

for epoch in range(num_epochs):
    loss_train = 0
    print(f"Epoch: {epoch+1}/{num_epochs}")
    model.train()

    train_meter = Meter.Meter()

    for batch_idx, batch in enumerate(train_loader):
        #print (batch)
        b_input_ids = batch['input_ids'].to(device=device)
        b_token_type_ids = batch['token_type_ids'].to(device=device)
        b_input_mask = batch['attention_mask'].to(device=device)
        b_labels = batch['labels'].to(device=device)

        #b_input_ids=torch.squeeze(b_input_ids)
        
        print (b_input_ids.size())
        #forward
        #predictions = model(data)#predictions=model(data.float())
        loss, predictions = model(b_input_ids, 
                             token_type_ids=b_token_type_ids, 
                             attention_mask=b_input_mask, 
                             labels=b_labels)
        
        loss= (criterion(predictions,targets)).mean()#.to(device)

        optimizer.zero_grad()

        #backward
        loss.backward()
        
        #gradient descent or adam step
        optimizer.step()
        
        loss_train += loss.item()
        train_meter.update(predictions, targets)

        score = np.mean(train_meter.compute_metric(metric_name))
    with torch.no_grad():
        val_score =  run_an_eval_epoch(valid_loader, model, metric_name)
        early_stop = stopper.step(val_score, model)

    total_score = np.mean(train_meter.compute_metric(metric_name))
    print(f'training {metric_name}: {total_score:.4f}, validation: {val_score:.4f} , best validation: {stopper.best_score:.4f}. ')
    
    if early_stop:
        break

This is the code, and the output:

Epoch: 1/5
torch.Size([64, 1, 1024])

/usr/local/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:29: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-0577dee49ad5> in <module>
     26                              token_type_ids=b_token_type_ids,
     27                              attention_mask=b_input_mask,
---> 28                              labels=b_labels)
     29 
     30         loss= (criterion(predictions,targets)).mean()#.to(device)

/usr/local/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1509             output_attentions=output_attentions,
   1510             output_hidden_states=output_hidden_states,
-> 1511             return_dict=return_dict,
   1512         )
   1513 

/usr/local/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    923         elif input_ids is not None:
    924             input_shape = input_ids.size()
--> 925             batch_size, seq_length = input_shape
    926         elif inputs_embeds is not None:
    927             input_shape = inputs_embeds.size()[:-1]

ValueError: too many values to unpack (expected 2)

When I do that with Trainer, I have to change the line:

seq_ids = self.tokenizer(seq, truncation=True, padding='max_length', max_length=self.max_length, return_tensors='pt').to(device)

to:

seq_ids = self.tokenizer(seq, truncation=True, padding='max_length', max_length=self.max_length)

in MyDataset class and then:

def model_init():
    bert= BertForSequenceClassification.from_pretrained(model_name, num_labels=1)
    bert=nn.DataParallel(bert)
    return bert.to(device)
    #return BertForSequenceClassification.from_pretrained(model_name, num_labels=1)

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=5,              # total number of training epochs
    per_device_train_batch_size=8,   # batch size per device during training
    per_device_eval_batch_size=8,   # batch size for evaluation
    do_train=True,                   # Perform training
    do_eval=True,                    # Perform evaluation
    evaluation_strategy="epoch",     # evalute after eachh epoch
    run_name="Seq Experiment"
)

from scipy.stats import pearsonr
def compute_metrics(predictions, targets):
    return pearsonr(predictions, targets)[0] ** 2

trainer = Trainer(
    model_init=model_init,                # the instantiated 🤗 Transformers model to be trained
    args=training_args,                   # training arguments, defined above
    train_dataset=train_seqs_encodings_dataset,          # training dataset
    eval_dataset=valid_seqs_encodings_dataset,             # evaluation dataset
    compute_metrics = compute_metrics,    # evaluation metrics
)

trainer.train()

I got:

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 15.78 GiB total capacity; 11.14 GiB already allocated; 78.75 MiB free; 11.22 GiB reserved in total by PyTorch)

on 8 GPUs :frowning:

Please someone help me. I am trying to build a text summarizer for my final year project. I have found it difficult to increase the number of words returned from the model in the example mrm8488/camembert2camembert_shared-finetuned-french-summarization · Hugging Face. It keeps returning the same number of words

import torch
from transformers import RobertaTokenizerFast, EncoderDecoderModel
device = 'cuda' if torch.cuda.is_available() else 'cpu'
ckpt = 'mrm8488/camembert2camembert_shared-finetuned-french-summarization'
tokenizer = RobertaTokenizerFast.from_pretrained(ckpt)
model = EncoderDecoderModel.from_pretrained(ckpt).to(device)
def generate_summary(text):
   inputs = tokenizer([text], padding="max_length", truncation=True, max_length=512, return_tensors="pt")
   input_ids = inputs.input_ids.to(device)
   attention_mask = inputs.attention_mask.to(device)
   output = model.generate(input_ids, attention_mask=attention_mask)
   return tokenizer.decode(output[0], skip_special_tokens=True)
   
text = "Un nuage de fumée juste après l’explosion, le 1er juin 2019. Une déflagration dans une importante usine d’explosifs du centre de la Russie a fait au moins 79 blessés samedi 1er juin. L’explosion a eu lieu dans l’usine Kristall à Dzerzhinsk, une ville située à environ 400 kilomètres à l’est de Moscou, dans la région de Nijni-Novgorod. « Il y a eu une explosion technique dans l’un des ateliers, suivie d’un incendie qui s’est propagé sur une centaine de mètres carrés », a expliqué un porte-parole des services d’urgence. Des images circulant sur les réseaux sociaux montraient un énorme nuage de fumée après l’explosion. Cinq bâtiments de l’usine et près de 180 bâtiments résidentiels ont été endommagés par l’explosion, selon les autorités municipales. Une enquête pour de potentielles violations des normes de sécurité a été ouverte. Fragments de shrapnel Les blessés ont été soignés après avoir été atteints par des fragments issus de l’explosion, a précisé une porte-parole des autorités sanitaires citée par Interfax. « Nous parlons de blessures par shrapnel d’une gravité moyenne et modérée », a-t-elle précisé. Selon des représentants de Kristall, cinq personnes travaillaient dans la zone où s’est produite l’explosion. Elles ont pu être évacuées en sécurité. Les pompiers locaux ont rapporté n’avoir aucune information sur des personnes qui se trouveraient encore dans l’usine."

generate_summary(text)