I have been doing the HF course and decided to apply what I have learned but I have unfortunately encountered some errors at the model.fit() stage.
I extracted BBC text data as an excel file from kaggle and converted it to a DatasetDict as below:
Loaded the tokenizer and tokenized the text features
Converted my data to tf_data and padded with DataCollator and instantiated the model
Optimizer and compile:
GEtting the error at the below stage
Not sure what I am doing wrong here as I tried following the steps in the course, thanks in advance
Managed to get it to work by changing ‘label’ to “labels” but now I have a different error during model.fit
labels.shape is inconsistent with logits.shape
@lewtun are you able to assist please?
It depends on what loss function the model has defined. As explained on Stackoverflow, your labels must either be 1-dimensional or 2-dimensional, depending on the loss function being used.
i.e. they must either be of shape (batch_size,) in which case they contain the class indices, or of shape (batch_size, num_labels), in which case they contain one-hot encoded targets.
Hi @nickmuchi, the key is in the warning that pops up when you
compile()! When you compile without specifying a
loss, the model will compute loss internally. For this to work, though, the labels need to be in your input dict. We talk about this in the Debugging your Training Pipeline section of the course.
There are two solutions here. One is to change your calls to
to_tf_dataset(). Instead of
columns=["attention_mask", "input_ids", "labels"],
This will put your labels in the input dict, and the model will be able to compute a loss in the forward pass. This is simple, but might cause issues with your accuracy metric. The alternative option is to leave the labels where they are, but instead to use a proper Keras loss. In that case, you would leave the call to
to_tf_dataset() unchanged, but change your
compile() call to
That should work, and will allow you to keep using the
accuracy metric too. Let me know if you encounter any other problems!
Thanks for your response, so I tried you suggestion but still getting the error and to be clear my labels are NOT one hot encoded.
Weird thing is that it worked when, instead of using tf_to_dataset, I used tf_from_tensor_slices. Wanted to use the former as I was following along that part of the course
appreciate your time and help
sorry I completely misunderstood what you said, reread and implemented and it worked!!! thank you!