BART fine tune on XSUM - jumpy train loss, weird eval loss

Hello, I have reproduced bart-large-xsum results starting from bart-large on the XSUM dataset. I am very happy about this result. However, looking at the wandb plots, I see the training loss jumping around and the validation loss just being so weird. Here you can see the two plots (The change of color just means I stop and I re started from a checkpoint).

On the other hand the Rouge score on the validation set is always improving.
Here are the parameters:

!python3 $finetune_script
–model_name_or_path $model_name_or_path
–config_name $model_name_or_path
–tokenizer_name $model_name_or_path
–data_dir $data_dir
–fp16 --warmup_steps 50
–attention_dropout 0 --dropout 0.1
–learning_rate 1.2e-4
–sortish_sampler --freeze_embeds
–task summarization
–max_source_length 1024
–max_target_length 60
–val_max_target_length 60
–do_train
–max_steps 5000
–n_val 500
–logging_steps 10 --save_steps 500 --save_total_limit 100
–per_device_train_batch_size 1 --per_device_eval_batch_size 1
–gradient_accumulation_steps 128 --eval_accumulation_steps 16
–do_eval --evaluation_strategy steps --eval_steps 500
–predict_with_generate
–output_dir $output_dir
–overwrite_output_dir
–seed $config.SEED
–run_name $output_dir

Do you have any idea why these two plots are behaving this way?