Fine tuning RoBerta got an unexpected keyword argument 'labels'

priamai · April 25, 2024, 12:25pm

Hi there,

This my code to just keep training RoBerta on an additional english corpus:

# Check that PyTorch sees it
import torch
torch.cuda.is_available()
from transformers import RobertaTokenizer, RobertaModel
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')
#encoded_input = tokenizer(text, return_tensors='pt')
%%time
from transformers import LineByLineTextDataset

dataset = LineByLineTextDataset(
    tokenizer=tokenizer,
    file_path="./train/reports.txt",
    block_size=128,
)
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./cyrobertapre",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=64,
    save_steps=10_000,
    save_total_limit=2,
    prediction_loss_only=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)
%%time
trainer.train()

I am getting a type error:

TypeError: RobertaModel.forward() got an unexpected keyword argument ‘labels’

I don’t know what it means.
Any help?

swtb · April 30, 2024, 1:27pm

RoBERTa (huggingface.co)

The forward pass expects certain inputs which are usually the natural output of the tokeniser

nielsr · May 1, 2024, 1:36pm

Hi,

Yes that’s because you’re loading RobertaModel, whereas you need to load a model with a head on top (could a be language modeling head on top, a sequence classification head, etc.). xxxModel in the Transformers library usually doesn’t include any downstream head on top, hence they don’t take the labels keyword argument.

In your case, you need to load the RobertForCausalLM, which is RobertaModel with a language modeling head on top for causal language modeling (the task of predicting the next token).

However, using Roberta for this task is typically not recommended, as the pre-trained wieghts of roberta-base are trained using bidirectional attention, suitable for tasks like classification, extractive question answering. Roberta is an encoder-only Transformer, unlike decoder-only LLMs such as ChatGPT, Llama, Mistral, etc.

Topic		Replies	Views
Using Trainer with custom model and custom dataset Beginners	1	4145	May 7, 2023
TypeError: forward() got an unexpected keyword argument 'labels' 🤗Transformers	4	18665	February 10, 2022
Inconsistencies between BERT and RoBERTa: what am I doing wrong? Beginners	0	359	May 11, 2022
Perform 1 Pretrain epoch on Pretrained model Beginners	0	360	July 5, 2022
Multilabel sequence classification with Roberta value error expected input batch size to match target batch size 🤗Transformers	1	4217	March 2, 2021

Fine tuning RoBerta got an unexpected keyword argument 'labels'

Related topics