Fine tuning RoBerta got an unexpected keyword argument 'labels'

Hi there,

This my code to just keep training RoBerta on an additional english corpus:

# Check that PyTorch sees it
import torch
torch.cuda.is_available()
from transformers import RobertaTokenizer, RobertaModel
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')
#encoded_input = tokenizer(text, return_tensors='pt')
%%time
from transformers import LineByLineTextDataset

dataset = LineByLineTextDataset(
    tokenizer=tokenizer,
    file_path="./train/reports.txt",
    block_size=128,
)
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./cyrobertapre",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=64,
    save_steps=10_000,
    save_total_limit=2,
    prediction_loss_only=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)
%%time
trainer.train()

I am getting a type error:

TypeError: RobertaModel.forward() got an unexpected keyword argument ‘labels’

I don’t know what it means.
Any help?

RoBERTa (huggingface.co)

The forward pass expects certain inputs which are usually the natural output of the tokeniser

Hi,

Yes that’s because you’re loading RobertaModel, whereas you need to load a model with a head on top (could a be language modeling head on top, a sequence classification head, etc.). xxxModel in the Transformers library usually doesn’t include any downstream head on top, hence they don’t take the labels keyword argument.

In your case, you need to load the RobertForCausalLM, which is RobertaModel with a language modeling head on top for causal language modeling (the task of predicting the next token).

However, using Roberta for this task is typically not recommended, as the pre-trained wieghts of roberta-base are trained using bidirectional attention, suitable for tasks like classification, extractive question answering. Roberta is an encoder-only Transformer, unlike decoder-only LLMs such as ChatGPT, Llama, Mistral, etc.