XLNET trainer.predict() RuntimeError: Input tensor at index 1 has invalid shape DISTRIBUTED METRICS

Model trains fine, but whenever I try and make a prediction, it gets ALMOST all the way through the test set until throwing an error that the input tensor (labels) has one less value than it is supposed to have.

RuntimeError: Input tensor at index 1 has invalid shape [42, 2, 768], but expected [42, 3, 768]

The dataset has 3 labels (0,1,2) and I made sure that automodel has 3 labels as well.
EDIT: restricting this to one GPU removed the error, so it has to do with distributed metric calculations I think

Code to recreate:

from datasets import load_dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")

def batch_and_tokenize(examples):
    return tokenizer(examples['sentence'])

dataset = load_dataset("financial_phrasebank", 'sentences_allagree', split='train').train_test_split(test_size=0.2)
tokenized_dataset = dataset.map(batch_and_tokenize, batched=True)

from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

model = AutoModelForSequenceClassification.from_pretrained("xlnet-base-cased",
                                                           num_labels=3)

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.01
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['test'],
    tokenizer=tokenizer,
    data_collator=data_collator
)

trainer.train()

predictions = trainer.predict(tokenized_dataset['test'])