How do I fine-tune roberta-large for text classification

nickmuchi · December 15, 2021, 11:25am

Hi there,

I have been doing the HF course and decided to apply what I have learned but I have unfortunately encountered some errors at the model.fit() stage.

I extracted BBC text data as an excel file from kaggle and converted it to a DatasetDict as below:

Loaded the tokenizer and tokenized the text features

Train_test_val split

Converted my data to tf_data and padded with DataCollator and instantiated the model

Optimizer and compile:

GEtting the error at the below stage

Not sure what I am doing wrong here as I tried following the steps in the course, thanks in advance

nickmuchi · December 15, 2021, 3:10pm

Managed to get it to work by changing ‘label’ to “labels” but now I have a different error during model.fit

labels.shape is inconsistent with logits.shape

nickmuchi · December 16, 2021, 2:29am

@lewtun are you able to assist please?

nielsr · December 16, 2021, 9:53am

Hi,

It depends on what loss function the model has defined. As explained on Stackoverflow, your labels must either be 1-dimensional or 2-dimensional, depending on the loss function being used.

i.e. they must either be of shape (batch_size,) in which case they contain the class indices, or of shape (batch_size, num_labels), in which case they contain one-hot encoded targets.

merve · December 16, 2021, 10:05am

@Rocketknight1

Rocketknight1 · December 16, 2021, 4:43pm

Hi @nickmuchi, the key is in the warning that pops up when you compile()! When you compile without specifying a loss, the model will compute loss internally. For this to work, though, the labels need to be in your input dict. We talk about this in the Debugging your Training Pipeline section of the course.

There are two solutions here. One is to change your calls to to_tf_dataset(). Instead of

columns=["attention_mask", "input_ids"],
label_cols=["labels"]

do

columns=["attention_mask", "input_ids", "labels"],

This will put your labels in the input dict, and the model will be able to compute a loss in the forward pass. This is simple, but might cause issues with your accuracy metric. The alternative option is to leave the labels where they are, but instead to use a proper Keras loss. In that case, you would leave the call to to_tf_dataset() unchanged, but change your compile() call to

model.compile(
optimizer=optimizer, 
metrics=['accuracy'],
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
)

That should work, and will allow you to keep using the accuracy metric too. Let me know if you encounter any other problems!

nickmuchi · December 16, 2021, 10:35pm

Thanks for your response, so I tried you suggestion but still getting the error and to be clear my labels are NOT one hot encoded.

Weird thing is that it worked when, instead of using tf_to_dataset, I used tf_from_tensor_slices. Wanted to use the former as I was following along that part of the course

appreciate your time and help

nickmuchi · December 17, 2021, 10:48am

sorry I completely misunderstood what you said, reread and implemented and it worked!!! thank you!

Topic		Replies	Views
Text classification with roberta Models	0	429	August 4, 2022
Error while training a custom hugging face RoBERTa Models	0	88	June 26, 2024
Shape mismatch between labels and logits 🤗Transformers	1	1686	December 27, 2023
AutoTrain Advanced 🤗AutoTrain	0	707	November 28, 2023
Multilabel sequence classification with Roberta value error expected input batch size to match target batch size 🤗Transformers	1	4233	March 2, 2021

How do I fine-tune roberta-large for text classification

Related topics