"Add_tokens" breaks words when encoding

I am using the add_tokens function in order to get a larger vocabulary for the distilgpt2 pre-trained tokenizer. But, when I do it, it changes the tokenizer’s behaviour when doing the encoding: it breaks words, in order to identify my new token.
I don’t understand why it does it, and whether is possible to force it to work as it was before.

I provide an example:

from transformers import AutoTokenizer, TFAutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = TFAutoModelForCausalLM.from_pretrained("distilgpt2")

If I use the word ‘crypt’, it does not break it, even if there are existing subwords in it:

## [29609]
## [20470]
## [457]

But if I add “ryp”, it breaks the words (and I don’t want them broken! I just want to add the full word “ryp”)

# [66, 50257, 83]

Would you know what this happens, and how I can force it to work as before?
I read the docs about the BPE encoding, but I don’t find how to force it to use the largest found token.

Thanks !