How padding in huggingface tokenizer works?

You need to change padding to "max_length". The default behavior (with padding=True) is to pad to the length of the longest sentence in the batch, meanwhile sentences longer than specified length are getting truncated to the specified max_length. In your example you have only one sentence, thus there’s no padding (the only sentence is the longest one). Your sentence is shorter than max length, so there’s no truncation either.

1 Like