Inconsistencies between BERT and RoBERTa: what am I doing wrong?

Transformer version: 4.18.0
Datasets version: 2.1.0
Python version: 3.8.13

Hello,
I was trying to have a very rapid and brief test with a simple pipeline that I got from the HuggingFace’s course.
The code is extremely easy, as follows:

from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset(“glue”, “cola”)
checkpoint = “bert-base-uncased”
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize_function(example):
return tokenizer(example[“sentence”], truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

from transformers import TrainingArguments

training_args = TrainingArguments(“test-trainer”)

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

from transformers import Trainer

trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets[“train”],
eval_dataset=tokenized_datasets[“validation”],
data_collator=data_collator,
tokenizer=tokenizer,
)
trainer.train()
predictions = trainer.predict(tokenized_datasets[“validation”])
print(predictions.predictions.shape, predictions.label_ids.shape)

import numpy as np
preds = np.argmax(predictions.predictions, axis=-1)

from datasets import load_metric

metric = load_metric(“glue”, “cola”)
metric.compute(predictions=preds, references=predictions.label_ids)

Results from this code is: {‘matthews_correlation’: 0.5347381322825221}. So a normal and coherent Matthews correlation result for BERT and Cola.
But if I repeat the same IDENTICAL code with RoBERTa or AlBERT, it becomes: {‘matthews_correlation’: 0.0}.

checkpoint = “roberta-base”

This is the only thing I’ve changed.