Hi,
I’m yusukemori.
While I check the model explanations in the pretrained_models list (https://huggingface.co/transformers/pretrained_models.html),
I found that there seemed to be a mistake regarding BART.
Regarding facebook/bart-large-cnn
, the explanation is as follows:
12-layer, 1024-hidden, 16-heads, 406M parameters (same as base)
bart-large base architecture finetuned on cnn summarization task
If my understanding is correct, are not 12-layer
and (same as base)
should be 24-layer
and (same as large)
?
I’m sorry if my understanding is wrong, or if someone has already noticed and fixed it.
Thank you in advance.
yusukemori