Update: problem solved. caused by a bug in model saving function
There’re not much difference on Rouge score before and after training.
The generated summarization of same article is almost same.
Need some suggestion.
I used original setting in fairseq.
-
label_smoothed_cross_entropy
, -
label-smoothing
0.1, -
attention-dropout
0.1, -
weight-decay
0.01, -
lr-scheduler
polynomial_decay, -
adam-betas
“(0.9, 0.999)” -
adam-eps
1e-08, -
LR
=3e-05
some difference:
- batch-size 3,
- gradient_accumulation_steps 16,
- adamW,
- fp32,
- clip-norm 1.0,
trained for 7 hours on 2 tesla P4.
Loss (my fault made it hard to read):
Model seems not converge .
Rouge score before training:
“rouge1”: 37.325248080026086,
“rouge2”: 18.03751262341448,
“rougeL”: 25.82438158757282
Rouge score after training:
“rouge1”: 37.818,
“rouge2”: 17.246,
“rougeL”: 23.7038
Update: problem solved. caused by a bug in model saving function