After some more digging I’ve discovered that the issue is related to this one.
The padding strategies are [‘longest’, ‘max_length’, ‘do_not_pad’]. The issue seems to be improper padding. If you set the strategy to do_not_pad
it will work (for one sentence) tokenizer(example)
.
But even when I set the max_length of the tokenizer to a number, the issue still persists.