Hi,
I am following this fantastic notebook to fine-tune a multi classifier.
Context:
- I am using my own dataset.
- Dataset is a CSV file with two values, text and label.
- Labels are all numbers.
- I have 7 labels.
- When loading the pre-trained model, I am assigning num_labels=7.
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased",num_labels=7)
When training, I am receiving this error:
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2847
2848
IndexError: Target 7 is out of bounds.
I have tried changing the number of labels to 2 and 5 and that didn’t solve the issue. Still getting out of bounds error.
Training arguments:
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir="./results",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=5,
weight_decay=0.01,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokinized_jobs["train"],
eval_dataset=tokinized_jobs["test"],
tokenizer=tokenizer,
data_collator=data_collator,
)
trainer.train()
and here is how tokenized data look like
DatasetDict({
train: Dataset({
features: ['attention_mask', 'input_ids', 'label', 'text', 'token_type_ids'],
num_rows: 1598
})
test: Dataset({
features: ['attention_mask', 'input_ids', 'label', 'text', 'token_type_ids'],
num_rows: 400
})
})
Sample:
{
'attention_mask': [1, 1, 1, 1, 1, 1, 1],
'input_ids': [101, 1015, 1011, 2095, 3325, 6871, 102],
'label': 2,
'text': '1-year experience preferred',
'token_type_ids': [0, 0, 0, 0, 0, 0, 0]
}
I tried it on Colab with GPU and TPU.
Any idea what is the issue?