Hi;
I have a problem with encoding with XLM-RoBERTa sentencepiece tokenizer. Why is the hugging face encoding 1 greater compared to the google sentencepiece encoding?
Example
## Hugging Face:
tokenizer_xlmroberta.encode("I don't understand why",add_special_tokens=False)
Output: [87, 2301, 25, 18, 28219, 15400]
## Sentencepiece:
tokenizer_xlmroberta_.encode_as_ids("I don't understand why")
Output: [86, 2300, 24, 17, 28218, 15399]