`additional_special_tokens` are not added

Hi Hugging Face Community,

I have the following questions regarding special tokens:

  1. Why doesn’t tokenizer.all_special_tokens include <image> token? I’m using the LLaVA model that has <image> as a special token (as defined in added_tokens_decoder of tokenizer_config.json). The tokenizer encodes and decodes it indeed as a special token. However, when I load in its tokenizer and call tokenizer.all_special_tokens or tokenizer.additional_special_tokens, <image> token is not included.

  2. Where is <image> token loaded? I looked into the tokenizer.from_pretrained function but there doesn’t seem to be a place to actually read in the added_tokens_decoder where in config file this special token is defined ?

  3. Where is tokenizer.decode function defined as a special token? I tried to break-point into it to find how it skipped <image> as a special token but I seem to get into a loop call between tokenization_utils_base.py and tokenization_utils_fast.py

It would be really helpful if you give an answer to any of these questions. Thank you very much!

I’m not allowed to paste more than 2 links in a post so I will provide the codes related to question 3 where I got into a loop.
tokenization_utils_base.py and tokenization_utils_fast.py