BART fine tune on XSUM - jumpy train loss, weird eval loss

marcoabrate · January 27, 2021, 10:23am

Hello, I have reproduced bart-large-xsum results starting from bart-large on the XSUM dataset. I am very happy about this result. However, looking at the wandb plots, I see the training loss jumping around and the validation loss just being so weird. Here you can see the two plots (The change of color just means I stop and I re started from a checkpoint).

On the other hand the Rouge score on the validation set is always improving.
Here are the parameters:

!python3 $finetune_script
–model_name_or_path $model_name_or_path
–config_name $model_name_or_path
–tokenizer_name $model_name_or_path
–data_dir $data_dir
–fp16 --warmup_steps 50
–attention_dropout 0 --dropout 0.1
–learning_rate 1.2e-4
–sortish_sampler --freeze_embeds
–task summarization
–max_source_length 1024
–max_target_length 60
–val_max_target_length 60
–do_train
–max_steps 5000
–n_val 500
–logging_steps 10 --save_steps 500 --save_total_limit 100
–per_device_train_batch_size 1 --per_device_eval_batch_size 1
–gradient_accumulation_steps 128 --eval_accumulation_steps 16
–do_eval --evaluation_strategy steps --eval_steps 500
–predict_with_generate
–output_dir $output_dir
–overwrite_output_dir
–seed $config.SEED
–run_name $output_dir

Do you have any idea why these two plots are behaving this way?

Topic		Replies	Views
Not able to reproduce the XSum rouge score with BART large model Models	0	330	January 22, 2022
BART XSum Finetuning - Loss Dropping Rapidly but Rouge F1 Decreasing to 0 Models	0	835	January 3, 2022
Bart-base rouge scores Research	11	1730	October 27, 2020
Finetuning BART for Abstractive Text Summarisation Beginners	1	5245	September 9, 2024
BART-base generating completely wrong output after training for more than 3 epochs Intermediate	0	854	July 8, 2021

BART fine tune on XSUM - jumpy train loss, weird eval loss

Related topics