While I check the model explanations in the pretrained_models list (https://huggingface.co/transformers/pretrained_models.html),
I found that there seemed to be a mistake regarding BART.
facebook/bart-large-cnn, the explanation is as follows:
12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) bart-large base architecture finetuned on cnn summarization task
If my understanding is correct, are not
(same as base) should be
(same as large)?
I’m sorry if my understanding is wrong, or if someone has already noticed and fixed it.
Thank you in advance.