HI
Right now, I am experimenting with things with t51.1, and I found something confusing.
According to this Vocab Size does not change when adding new tokens · Issue #12632 · huggingface/transformers · GitHub
The tokenizer.vocab
size should contain base vocab, but when I do
len(tokenizer)
and tokenizer.vocab_size
, they print out the same number, 32100, which, If I am not misunderstood, should not equal, right? because the len(tokenizer)
is len(vocab_size) + added_tokens
Can someone clarify this, please?