Trying to tune a multilabel (4 labels) model based on roberta-base. Iâve followed the examples in https://huggingface.co/transformers/custom_datasets.html.
Trying to debug this value error:
Traceback (most recent call last):
trainer.train()
File âtransformers/trainer.pyâ, line 762, in train
tr_loss += self.training_step(model, inputs)
File âtransformers/trainer.pyâ, line 1112, in training_step
loss = self.compute_loss(model, inputs)
File âtransformers/trainer.pyâ, line 1136, in compute_loss
outputs = model(**inputs)
File âtorch/nn/modules/module.pyâ, line 532, in call
result = self.forward(*input, **kwargs)
File âtransformers/modeling_roberta.pyâ, line 1015, in forward
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
File âtorch/nn/modules/module.pyâ, line 532, in call
result = self.forward(*input, **kwargs)
File âtorch/nn/modules/loss.pyâ, line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File âtorch/nn/functional.pyâ, line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File âtorch/nn/functional.pyâ, line 1836, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (16) to match target batch_size (64).
I see this in modeling_roberta at the point of error. I looks like the labels for each of the batch results have been flattened into a single tensor, while the batch has the labels separately for each example of the 16. Seems like this might be the cause of the ValueError? but Iâm not sure, and donât know where the labels would have been flattened. Any ideas?
tensor([[ 0.1793,  0.1338, -0.2123, -0.0945],
[ 0.0498,  0.0472, -0.1983, -0.0353],
[ 0.1932,  0.1970, -0.2003, -0.0471],
[ 0.0913,  0.1411, -0.1835, -0.1387],
[ 0.0770, -0.0101, -0.1017, -0.0149],
[ 0.1980,  0.0772, -0.1894, -0.0487],
[ 0.0161,  0.0107, -0.0100,  0.0067],
[ 0.1063,  0.1120, -0.1842, -0.0567],
[ 0.1610,  0.0769, -0.1609, -0.0883],
[ 0.1866,  0.0182, -0.1137, -0.1047],
[ 0.1132,  0.0587, -0.2452, -0.0698],
[ 0.1680, -0.0125, -0.2019, -0.0674],
[-0.0282,  0.1099, -0.1637, -0.1112],
[ 0.1620,  0.1197, -0.2099,  0.0236],
[ 0.1197,  0.1232, -0.2318, -0.0955],
[ 0.3232,  0.1935, -0.3226, -0.0547]], device=âcuda:0â,
grad_fn=)
labels view
tensor([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0], device=âcuda:0â)