What’s the difference conceptually? I can understand the diff in uncased and cased ones for bert.
But why this?
btw, bart base and large have the same “vocab_size”: 50265 in their config.
Thanks.
It is obviously related to more number of parameters used in the bart-large as mentioned in the description. facebook/bart-large24-layer, 1024-hidden, 16-heads, 406M parameters facebook/bart-base12-layer, 768-hidden, 16-heads, 139M parameters
Thanks for reply. but why is a tokenizer dependent on the number of model’s parameters? isn’t it just responsible for text tokenization for corpus and not related to model’s size?
Easy there with the “obviously”. This isn’t obvious, because as @zuujhyt rightfully says, the number of parameters is typically not directly related with the vocab. I.e. the vocab embeddings index often do not change between small/large models, but the model’s blocks get wider and/or deeper. I think this is a good question.