Model trains fine, but whenever I try and make a prediction, it gets ALMOST all the way through the test set until throwing an error that the input tensor (labels) has one less value than it is supposed to have.
RuntimeError: Input tensor at index 1 has invalid shape [42, 2, 768], but expected [42, 3, 768]
The dataset has 3 labels (0,1,2) and I made sure that automodel has 3 labels as well.
EDIT: restricting this to one GPU removed the error, so it has to do with distributed metric calculations I think
Code to recreate:
from datasets import load_dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
def batch_and_tokenize(examples):
return tokenizer(examples['sentence'])
dataset = load_dataset("financial_phrasebank", 'sentences_allagree', split='train').train_test_split(test_size=0.2)
tokenized_dataset = dataset.map(batch_and_tokenize, batched=True)
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
model = AutoModelForSequenceClassification.from_pretrained("xlnet-base-cased",
num_labels=3)
training_args = TrainingArguments(
output_dir="./results",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=1,
weight_decay=0.01
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset['train'],
eval_dataset=tokenized_dataset['test'],
tokenizer=tokenizer,
data_collator=data_collator
)
trainer.train()
predictions = trainer.predict(tokenized_dataset['test'])