tokenizer1 = BartTokenizer.from_pretrained('facebook/bart-base')
tokenizer2 = BartTokenizer.from_pretrained('facebook/bart-large')
What’s the difference conceptually? I can understand the diff in uncased and cased ones for bert.
But why this?
btw, bart base and large have the same “vocab_size”: 50265 in their config.
It is obviously related to more number of parameters used in the bart-large as mentioned in the description.
facebook/bart-large 24-layer, 1024-hidden, 16-heads, 406M parameters
facebook/bart-base 12-layer, 768-hidden, 16-heads, 139M parameters
Thanks for reply. but why is a tokenizer dependent on the number of model’s parameters? isn’t it just responsible for text tokenization for corpus and not related to model’s size?
Easy there with the “obviously”. This isn’t obvious, because as @zuujhyt rightfully says, the number of parameters is typically not directly related with the vocab. I.e. the vocab embeddings index often do not change between small/large models, but the model’s blocks get wider and/or deeper. I think this is a good question.
Agree, would like to know more about it
Those tokenizers are identical. You can check it by just comparing the files over at https://huggingface.co/facebook/bart-base/tree/main and https://huggingface.co/facebook/bart-large/tree/main
Incidentally, they’re also the same as the ones for
We duplicate tokenizers into their models for ease of use (a model id is all you need)