Maximum recursion depth exceeded when using DataCollator

After some more digging I’ve discovered that the issue is related to this one.

The padding strategies are [‘longest’, ‘max_length’, ‘do_not_pad’]. The issue seems to be improper padding. If you set the strategy to do_not_pad it will work (for one sentence) tokenizer(example).

But even when I set the max_length of the tokenizer to a number, the issue still persists.