Hello, I try to tokenize the sentence with “bert-base-uncased” with 3 max_length with these sentences " [‘I love it’, “You done”],[“Mary do”, “Dog eats paper”]" and it returns a lot of sentence with more max_length than I set. Please, describe this phenomenon.
if you use it without ‘return_overflowing_token’, return successfully truncated token.
Also, i tried your same code. if ‘return_overflowing_token = True’ exist, return Error code.
Huggingface document say that
bool , optional , defaults to
False ) — Whether or not to return overflowing token sequences. If a pair of sequences of input ids (or a batch of pairs) is provided with
truncation_strategy = longest_first or
True , an error is raised instead of returning overflowing tokens.