Hi I try to reproduce the result related to BART and the result is not comparable to the claimed performance. I tried sshleifer/distilbart-cnn-12-6
and facebook/bart-large-cnn
and met the same problem.
My generation process is modified based on the released summarization pipeline.
python run_eval.py sshleifer/distilbart-cnn-12-6 $DATA_DIR/test.source $OUTPUT_FILE \
--reference_path $DATA_DIR/test.target \
--task summarization \
--device cuda \
--fp16 \
--bs 32
My performance without post-processing:
1 ROUGE-1 Average_R: 0.48286 (95%-conf.int. 0.48036 - 0.48554)
1 ROUGE-1 Average_P: 0.33581 (95%-conf.int. 0.33356 - 0.33802)
1 ROUGE-1 Average_F: 0.38536 (95%-conf.int. 0.38338 - 0.38737)
---------------------------------------------
1 ROUGE-2 Average_R: 0.20405 (95%-conf.int. 0.20148 - 0.20648)
1 ROUGE-2 Average_P: 0.14260 (95%-conf.int. 0.14067 - 0.14449)
1 ROUGE-2 Average_F: 0.16314 (95%-conf.int. 0.16108 - 0.16517)
---------------------------------------------
1 ROUGE-L Average_R: 0.40419 (95%-conf.int. 0.40174 - 0.40665)
1 ROUGE-L Average_P: 0.28191 (95%-conf.int. 0.27984 - 0.28396)
1 ROUGE-L Average_F: 0.32309 (95%-conf.int. 0.32111 - 0.32509)
My performance with post-posting (from ProphetNet):
1 ROUGE-1 Average_R: 0.49758 (95%-conf.int. 0.49505 - 0.50028)
1 ROUGE-1 Average_P: 0.35663 (95%-conf.int. 0.35421 - 0.35889)
1 ROUGE-1 Average_F: 0.40406 (95%-conf.int. 0.40200 - 0.40607)
---------------------------------------------
1 ROUGE-2 Average_R: 0.21882 (95%-conf.int. 0.21622 - 0.22125)
1 ROUGE-2 Average_P: 0.15750 (95%-conf.int. 0.15543 - 0.15947)
1 ROUGE-2 Average_F: 0.17794 (95%-conf.int. 0.17576 - 0.17998)
---------------------------------------------
1 ROUGE-L Average_R: 0.41627 (95%-conf.int. 0.41375 - 0.41881)
1 ROUGE-L Average_P: 0.29928 (95%-conf.int. 0.29712 - 0.30132)
1 ROUGE-L Average_F: 0.33860 (95%-conf.int. 0.33658 - 0.34056)
The expected performance for sshleifer/distilbart-cnn-12-6
is ?/21.26/30.59 and I can only achieve 40.41/17.79/33.86. So is the trick related to the post-processing, or how can I achieve the expected performance?
Thank you!