Bart-base rouge scores

Has anyone finetuned bart-base on xsum or cnn summarization task and willing to report the rouge score they got?
I just got 15.5 for xum which feels low, since bart-large can get to 22 ish.

@colanim @valhalla @VictorSanh ?

@sshleifer, could it be due to the adjust_logits issue ? Just a guess but as I posted there, after modifying the adjust_logits_during_generation BLUE-4 score for my model went from 13.09 to 19.14 for bart-base

@sshleifer could you also try using bos as decoder_start_token_id and modifying adjust_logits_during_generation to return logits as is instead of forcing bos ? If you also get bump in ROUGE score we can confirm the issue. Thanks !

Possible suggestion that saves on the re-training could be to check the perplexity values and compare to paper

I got 16.6 ROUGE 2 on XSUM, in 3 epochs/ 6hrs

bart-base doesn’t seem to be good then, in my other seq2seq experiment t5-small performed similar/better to bart-base

Made a google doc to aggregate experiment results. Please add any interesting results!

2 Likes

How can I change the adjust_logits_during_generation ? thanks

By editing the code!

1 Like

Can you provide a example ? I saw the source code of adjust_logits_during_generation and it directly returns the logits.

in the future git grep adjust_logits_during_generation

thanks :heart: :heart: :heart: