How to make Data Loader for "Multi-Head" Regression which can be used with Trainer

deshwalmahesh · December 24, 2023, 8:49pm

I am working on a Multi head regression problem where for each text I want to predict 5 scores. You can do this by setting problem_type = 'regression' as given in transformers code

Issue is that when I run my model with Trainer, it gives an error like:

Error:

    raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`labels` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

It worked with num_classes = 1 but when I do it with 5, it throws this error. Below are the minimal code for my model, data.

Model

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                           num_labels=5, 
                                                          problem_type = "regression")

Custom DataLoader:

class MultiRegressionDataset(torch.utils.data.Dataset):
    def __init__(self, texts, labels):
        self.labels = labels
        self.texts = texts
        

    def __getitem__(self, idx, sanity_check = False):
        output = tokenizer(self.texts[idx], truncation=True,
                              padding="max_length",
                              max_length = 128) # This returns a dict

        output['labels'] = torch.tensor(self.labels[idx])
        
        return output

data = MultiRegressionDataset(["text1", "text2"], [[1,2,3,4,5], [5,4,3,2,1]])

data.__getitem__(0) # Gives a value

Tried doing it with

output['labels'] = torch.tensor(self.labels[idx]).unsqueeze(-1)
Combination of return_tensors = "pt" with the above

Nothing worked. What am I doing wrong here?

Topic		Replies	Views
Why does my PyTorch DataLoader only use one CPU core despite setting num_workers>1 when running BERT model> Beginners	2	72	December 27, 2024
DeBERTa - ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length 🤗Tokenizers	2	1482	October 3, 2023
It asks to add padding or truncation but I have already done it Beginners	1	826	October 6, 2023
Issues with Data Collator and Tokenizing with NER Datasets 🤗Tokenizers	1	2505	May 9, 2022
Expected input batch_size (2048) to match target batch_size (4) Beginners	3	1602	May 23, 2022

How to make Data Loader for "Multi-Head" Regression which can be used with Trainer

Error:

Model

Custom DataLoader:

Related topics