Failed to train bart-cnn from bart-base using my own code

Update: problem solved. caused by a bug in model saving function

There’re not much difference on Rouge score before and after training.
The generated summarization of same article is almost same.
Need some suggestion.

I used original setting in fairseq.

  • label_smoothed_cross_entropy,
  • label-smoothing 0.1,
  • attention-dropout 0.1,
  • weight-decay 0.01,
  • lr-scheduler polynomial_decay,
  • adam-betas “(0.9, 0.999)”
  • adam-eps 1e-08,
  • LR=3e-05

some difference:

  • batch-size 3,
  • gradient_accumulation_steps 16,
  • adamW,
  • fp32,
  • clip-norm 1.0,

trained for 7 hours on 2 tesla P4.

Loss (my fault made it hard to read):

Model seems not converge .

Rouge score before training:

“rouge1”: 37.325248080026086,
“rouge2”: 18.03751262341448,
“rougeL”: 25.82438158757282

Rouge score after training:

“rouge1”: 37.818,
“rouge2”: 17.246,
“rougeL”: 23.7038

Update: problem solved. caused by a bug in model saving function

AFAIK there isn’t bart-base checkpoint trained for summarization, how did you get those metrics without training ?

I directly calculate rouge score on cnn/mail before training.

I think there’re bugs in my code, but have no idea what kind of bug would cause the problem.