How do I fine-tune roberta-large for text classification

Hi there,

I have been doing the HF course and decided to apply what I have learned but I have unfortunately encountered some errors at the stage.

I extracted BBC text data as an excel file from kaggle and converted it to a DatasetDict as below:

Loaded the tokenizer and tokenized the text features

Train_test_val split

Converted my data to tf_data and padded with DataCollator and instantiated the model

Optimizer and compile:

GEtting the error at the below stage

Not sure what I am doing wrong here as I tried following the steps in the course, thanks in advance

Managed to get it to work by changing ‘label’ to “labels” but now I have a different error during

labels.shape is inconsistent with logits.shape


@lewtun are you able to assist please?


It depends on what loss function the model has defined. As explained on Stackoverflow, your labels must either be 1-dimensional or 2-dimensional, depending on the loss function being used.

i.e. they must either be of shape (batch_size,) in which case they contain the class indices, or of shape (batch_size, num_labels), in which case they contain one-hot encoded targets.


Hi @nickmuchi, the key is in the warning that pops up when you compile()! When you compile without specifying a loss, the model will compute loss internally. For this to work, though, the labels need to be in your input dict. We talk about this in the Debugging your Training Pipeline section of the course.

There are two solutions here. One is to change your calls to to_tf_dataset(). Instead of

columns=["attention_mask", "input_ids"],


columns=["attention_mask", "input_ids", "labels"],

This will put your labels in the input dict, and the model will be able to compute a loss in the forward pass. This is simple, but might cause issues with your accuracy metric. The alternative option is to leave the labels where they are, but instead to use a proper Keras loss. In that case, you would leave the call to to_tf_dataset() unchanged, but change your compile() call to


That should work, and will allow you to keep using the accuracy metric too. Let me know if you encounter any other problems!

Thanks for your response, so I tried you suggestion but still getting the error and to be clear my labels are NOT one hot encoded.

Weird thing is that it worked when, instead of using tf_to_dataset, I used tf_from_tensor_slices. Wanted to use the former as I was following along that part of the course

appreciate your time and help

sorry I completely misunderstood what you said, reread and implemented and it worked!!! thank you!