Update: problem solved. caused by a bug in model saving function
There’re not much difference on Rouge score before and after training.
The generated summarization of same article is almost same.
Need some suggestion.
I used original setting in fairseq.
label_smoothed_cross_entropy
,label-smoothing
0.1,attention-dropout
0.1,weight-decay
0.01,lr-scheduler
polynomial_decay,adam-betas
“(0.9, 0.999)”adam-eps
1e-08,LR
=3e-05
some difference:
- batch-size 3,
- gradient_accumulation_steps 16,
- adamW,
- fp32,
- clip-norm 1.0,
trained for 7 hours on 2 tesla P4.
Loss (my fault made it hard to read):
Model seems not converge .
Rouge score before training:
“rouge1”: 37.325248080026086,
“rouge2”: 18.03751262341448,
“rougeL”: 25.82438158757282
Rouge score after training:
“rouge1”: 37.818,
“rouge2”: 17.246,
“rougeL”: 23.7038
Update: problem solved. caused by a bug in model saving function