Multilabel sequence classification with Roberta value error expected input batch size to match target batch size

laurb · October 20, 2020, 1:52pm

Trying to tune a multilabel (4 labels) model based on roberta-base. I’ve followed the examples in https://huggingface.co/transformers/custom_datasets.html.

Trying to debug this value error:
Traceback (most recent call last):
trainer.train()
File “transformers/trainer.py”, line 762, in train
tr_loss += self.training_step(model, inputs)
File “transformers/trainer.py”, line 1112, in training_step
loss = self.compute_loss(model, inputs)
File “transformers/trainer.py”, line 1136, in compute_loss
outputs = model(**inputs)
File “torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “transformers/modeling_roberta.py”, line 1015, in forward
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
File “torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “torch/nn/modules/loss.py”, line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File “torch/nn/functional.py”, line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File “torch/nn/functional.py”, line 1836, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (16) to match target batch_size (64).

I see this in modeling_roberta at the point of error. I looks like the labels for each of the batch results have been flattened into a single tensor, while the batch has the labels separately for each example of the 16. Seems like this might be the cause of the ValueError? but I’m not sure, and don’t know where the labels would have been flattened. Any ideas?
tensor([[ 0.1793, 0.1338, -0.2123, -0.0945],
[ 0.0498, 0.0472, -0.1983, -0.0353],
[ 0.1932, 0.1970, -0.2003, -0.0471],
[ 0.0913, 0.1411, -0.1835, -0.1387],
[ 0.0770, -0.0101, -0.1017, -0.0149],
[ 0.1980, 0.0772, -0.1894, -0.0487],
[ 0.0161, 0.0107, -0.0100, 0.0067],
[ 0.1063, 0.1120, -0.1842, -0.0567],
[ 0.1610, 0.0769, -0.1609, -0.0883],
[ 0.1866, 0.0182, -0.1137, -0.1047],
[ 0.1132, 0.0587, -0.2452, -0.0698],
[ 0.1680, -0.0125, -0.2019, -0.0674],
[-0.0282, 0.1099, -0.1637, -0.1112],
[ 0.1620, 0.1197, -0.2099, 0.0236],
[ 0.1197, 0.1232, -0.2318, -0.0955],
[ 0.3232, 0.1935, -0.3226, -0.0547]], device=‘cuda:0’,
grad_fn=)
labels view
tensor([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0], device=‘cuda:0’)

jwa018 · March 2, 2021, 1:05pm

I had the same problem. The problem lies in nll_loss. For multilabel problems BCEWithLogitsLoss is the most common I think. You can subclass Trainer and overwrite the compute_loss function in your custom trainer to make things work. This worked for me:


    class CustomTrainer(Trainer):
        def compute_loss(self, model, inputs, return_outputs=False):
            outputs = model(
                input_ids=inputs['input_ids'],
                attention_mask=inputs['attention_mask'],
                token_type_ids=inputs['token_type_ids']
            )
            loss = th.nn.BCEWithLogitsLoss()(outputs['logits'],
                                             inputs['labels'])
            return (loss, outputs) if return_outputs else loss

Topic		Replies	Views
ValueError: Expected input batch_size (16) to match target batch_size (64) Beginners	7	4974	November 7, 2023
Error while training a custom hugging face RoBERTa Models	0	88	June 26, 2024
"ValueError: Target size must be same as input size" when training twitter-roberta-base-emotion-multilabel-latest Beginners	1	408	May 2, 2023
ValueError: Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 3])) Beginners	0	627	December 20, 2022
Mismatched target and input size for BCE using "multi_label_classification" Intermediate	2	7002	September 1, 2022

Multilabel sequence classification with Roberta value error expected input batch size to match target batch size

Related topics