There seems to be a mistake in documentation (pretrained_models.html) regarding BART

yusukemori · October 26, 2020, 3:16am

Hi,

I’m yusukemori.

While I check the model explanations in the pretrained_models list (https://huggingface.co/transformers/pretrained_models.html),
I found that there seemed to be a mistake regarding BART.

Regarding facebook/bart-large-cnn, the explanation is as follows:

12-layer, 1024-hidden, 16-heads, 406M parameters (same as base)
bart-large base architecture finetuned on cnn summarization task

If my understanding is correct, are not 12-layer and (same as base) should be 24-layer and (same as large)?

I’m sorry if my understanding is wrong, or if someone has already noticed and fixed it.

Thank you in advance.

yusukemori

sgugger · October 26, 2020, 1:17pm

It does seem wrong. Don’t hesitate to suggest a PR to fix this!

yusukemori · October 26, 2020, 1:37pm

@sgugger
Thank you for checking my post and giving me advice!
I will suggest the PR as soon as possible!
(This will be my first PR to Transformers. I’m worried, but I’m excited!)

Topic		Replies	Views
[Bart] Bart model families' embedding shape? Beginners	0	229	May 13, 2023
Bart Large Saved vs Pretrained Size Models	0	467	February 9, 2022
Facebook BART Fine-tuning - Transformers CUDA error: CUBLAS_STATUS_NOT_INITIALIZE 🤗Transformers	4	1762	May 2, 2023
Bart Large CNN summarization Beginners	6	5556	February 5, 2021
What's the difference between bart-base tokenizer and bart-large tokenizer Beginners	6	2041	December 6, 2020

There seems to be a mistake in documentation (pretrained_models.html) regarding BART

Related topics