Has anyone finetuned bart-base
on xsum or cnn summarization task and willing to report the rouge score they got?
I just got 15.5 for xum which feels low, since bart-large can get to 22 ish.
@sshleifer, could it be due to the adjust_logits
issue ? Just a guess but as I posted there, after modifying the adjust_logits_during_generation
BLUE-4 score for my model went from 13.09 to 19.14 for bart-base
@sshleifer could you also try using bos
as decoder_start_token_id
and modifying adjust_logits_during_generation
to return logits as is instead of forcing bos
? If you also get bump in ROUGE score we can confirm the issue. Thanks !
Possible suggestion that saves on the re-training could be to check the perplexity values and compare to paper
I got 16.6 ROUGE 2 on XSUM, in 3 epochs/ 6hrs
bart-base doesn’t seem to be good then, in my other seq2seq experiment t5-small performed similar/better to bart-base
Made a google doc to aggregate experiment results. Please add any interesting results!
How can I change the adjust_logits_during_generation ? thanks
By editing the code!
Can you provide a example ? I saw the source code of adjust_logits_during_generation
and it directly returns the logits.
in the future git grep adjust_logits_during_generation
thanks