Error in pad() function of transformers/tokenization_utils_base.py

Hello!

I’ve been going through the fine-tuning part of this tutorial: Getting Started with Sentiment Analysis using Python (huggingface.co)

I was able to make it work using the IMDB dataset that the tutorial covered. I tried to do the same thing with the airline tweets (osanseviero/twitter-airline-sentiment) dataset and I’m getting an error when running the trainer.train() function:


ValueError Traceback (most recent call last)
in <cell line: 2>()
1 # Train the model
----> 2 trainer.train()

8 frames
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py in pad(self, encoded_inputs, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, verbose)
3477 # The model’s main input name, usually input_ids, has be passed for padding
3478 if self.model_input_names[0] not in encoded_inputs:
→ 3479 raise ValueError(
3480 “You should supply an encoding or a list of encodings to this method "
3481 f"that includes {self.model_input_names[0]}, but you provided {list(encoded_inputs.keys())}”

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [‘label’]

Code can be seen here: toddpglidden/fine_tuning_distilbert_airline_tweets.ipynb at main · toddpglidden/toddpglidden (github.com)

Any thoughts?

Thanks!

You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [‘label’]

I don’t often use transformers directly, so sorry if I’m off the mark.
Maybe you simply chose or used the wrong tokenizer.
According to the error above, you’re getting an error when data is passed in the wrong format, not the content.

I thought it was, but when I searched for it, it seems to be an error that appears in some other, weirder place. It seems to be an error of a type that is not allowed when something extra is written.
Deleting a few lines may fix it.

It’s the same tokenizer/model combination that worked on the IMDB data that the tutorial used.

1 Like

Published February 2, 2022

That is probably due to this date.

HF is the front line of development, for better or worse, so they only look forward. They don’t really look at the sides or the rear.
It’s quite common for backward compatibility of functions to disappear in a few months. future warnings are out there, but few people read them. And functions that people don’t use frequently are rather commonly buggy.

Maybe the tutorial is wrong now, give site:huggingface.co to Google and find a new one.
I’m sure some non-HF know-how would be useful, but Google’s search results are useless these days…
I know I sound like an old man, but Google wasn’t this useless in the past.

1 Like