2 tokens for one character in T5

When tokenizing text, T5(small?) tokenizer adds an eos_token, that’s expected.

A bit weird is that for the character\string\sentence “0”, it tokenizes it to three tokens! One of them is a token that is detokenized to an empty string.

Is that an error on my part? Is it a bug? How does that happen? T5 is char based, right, so at the bare minimum, each character should be in the dictionary.

tokenizer_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, use_fast=True)
print(tokenizer.encode("0")) # [3, 632, 1]
print(tokenizer.encode("1"))
print(tokenizer.encode("2"))
print(tokenizer.encode("3"))

Outputs:
[3, 632, 1]
[209, 1]
[204, 1]
[220, 1]