Reproduce results on CNN/DailyMail - PEGASUS

I currently aim to finetune the implemented Pegasus on the CNN/DailyMail dataset from the ‘google/pegasus-large’ checkpoint. However, I was unable to achieve claimed numbers (Pegasus: replication and distillation results · Issue #6844 · huggingface/transformers · GitHub). My results are ROUGE-1: 43.7 and ROUGE-L: 40.6. I assume that I need to modify some hyperparameters.

I would be grateful if you could give me any comment or advice.

P.S: these are hyperparameters I have tried

max_input_length=1024,
max_output_length=128,
freeze_encoder=False,
freeze_embeds=True,
learning_rate=1e-4 (1e-3),
weight_decay=0.0,
adam_epsilon=1e-8,
warmup_steps=10000,
gradient_accumulation_steps=8,
fp_16=False,
opt_level=‘O1’,
max_grad_norm=1.0,
num_train_epochs=20 (10),
train_batch_size=4,
eval_batch_size=16

I can help telling you that you should use Adafactor with PEGASUS and not Adam.

For optimization, both pre-training and fine-tuning used Adafactor (Shazeer & Stern, 2018) with square root learning rate decay and dropout rate of 0.1.

1 Like

I wonder if warmup steps and weight decay bear a big impact to results? Sorry if I am too detailed because I have been finetuning for months but results are not as expected :pensive: