Failed to train bart-cnn from bart-base using my own code

LiuYangyang · October 7, 2020, 2:56am

Update: problem solved. caused by a bug in model saving function

There’re not much difference on Rouge score before and after training.
The generated summarization of same article is almost same.
Need some suggestion.

I used original setting in fairseq.

label_smoothed_cross_entropy,
label-smoothing 0.1,
attention-dropout 0.1,
weight-decay 0.01,
lr-scheduler polynomial_decay,
adam-betas “(0.9, 0.999)”
adam-eps 1e-08,
LR=3e-05

some difference:

batch-size 3,
gradient_accumulation_steps 16,
adamW,
fp32,
clip-norm 1.0,

trained for 7 hours on 2 tesla P4.

Loss (my fault made it hard to read):

Model seems not converge .

Rouge score before training:

“rouge1”: 37.325248080026086,
“rouge2”: 18.03751262341448,
“rougeL”: 25.82438158757282

Rouge score after training:

“rouge1”: 37.818,
“rouge2”: 17.246,
“rougeL”: 23.7038

Update: problem solved. caused by a bug in model saving function

valhalla · October 7, 2020, 4:15pm

AFAIK there isn’t bart-base checkpoint trained for summarization, how did you get those metrics without training ?

LiuYangyang · October 8, 2020, 1:55am

I directly calculate rouge score on cnn/mail before training.

LiuYangyang · October 8, 2020, 1:56am

I think there’re bugs in my code, but have no idea what kind of bug would cause the problem.

Topic		Replies	Views
[HELP]Bart summarization output exactly the same as labels 🤗Transformers	3	869	August 4, 2021
Cannot reproduce the results Beginners	5	895	October 5, 2020
Bart-base rouge scores Research	11	1744	October 27, 2020
Facebook/bart-large-cnn has a low rouge score on cnn_dailymail Beginners	14	3278	October 5, 2020
BART XSum Finetuning - Loss Dropping Rapidly but Rouge F1 Decreasing to 0 Models	0	844	January 3, 2022

Failed to train bart-cnn from bart-base using my own code

Related topics