How to make Data Loader for "Multi-Head" Regression which can be used with Trainer

I am working on a Multi head regression problem where for each text I want to predict 5 scores. You can do this by setting problem_type = 'regression' as given in transformers code

Issue is that when I run my model with Trainer, it gives an error like:


    raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`labels` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

It worked with num_classes = 1 but when I do it with 5, it throws this error. Below are the minimal code for my model, data.


tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                          problem_type = "regression")

Custom DataLoader:

class MultiRegressionDataset(
    def __init__(self, texts, labels):
        self.labels = labels
        self.texts = texts

    def __getitem__(self, idx, sanity_check = False):
        output = tokenizer(self.texts[idx], truncation=True,
                              max_length = 128) # This returns a dict

        output['labels'] = torch.tensor(self.labels[idx])
        return output

data = MultiRegressionDataset(["text1", "text2"], [[1,2,3,4,5], [5,4,3,2,1]])

data.__getitem__(0) # Gives a value

Tried doing it with

  1. output['labels'] = torch.tensor(self.labels[idx]).unsqueeze(-1)
  2. Combination of return_tensors = "pt" with the above

Nothing worked. What am I doing wrong here?