Hello, I have reproduced
bart-large-xsum results starting from
bart-large on the XSUM dataset. I am very happy about this result. However, looking at the wandb plots, I see the training loss jumping around and the validation loss just being so weird. Here you can see the two plots (The change of color just means I stop and I re started from a checkpoint).
On the other hand the Rouge score on the validation set is always improving.
Here are the parameters:
–fp16 --warmup_steps 50
–attention_dropout 0 --dropout 0.1
–logging_steps 10 --save_steps 500 --save_total_limit 100
–per_device_train_batch_size 1 --per_device_eval_batch_size 1
–gradient_accumulation_steps 128 --eval_accumulation_steps 16
–do_eval --evaluation_strategy steps --eval_steps 500
Do you have any idea why these two plots are behaving this way?