I’m trying to pretokenize input sentences using GPT2TokenizerFast, then feed the pretokenized input through the sub-word tokenizer.

I found out that the Fast Tokenizers in the huggingface library allow access to pretokenization through:

This works to get the pretokenized input, and I thought that I could just feed this into the tokenizer with ‘is split into words’ set to ‘True’ as follows:
tokenizer(pretokenized_input, return_tensors='pt', is_split_into_words = True).tokens()
but I get an unexpected result. After pretokenization of the sentence ‘The person went to the store’ I get:
[‘ĠThe’, ‘Ġperson’, ‘Ġwent’, ‘Ġto’, ‘Ġthe’, ‘Ġstore’]

But when I feed this through the tokenizer I get:
[‘ĠÄ’, ‘ł’, ‘The’, ‘ĠÄ’, ‘ł’, ‘person’, ‘ĠÄ’, ‘ł’, ‘went’, ‘ĠÄ’, ‘ł’, ‘to’, ‘ĠÄ’, ‘ł’, ‘the’, ‘ĠÄ’, ‘ł’, ‘store’]

Why does it add the tokens: ‘ĠÄ’ and ‘ł’?