I am working on generating summaries using SOTA models. Recently I am using pegasus-related models to generate results. It surprises me that google/pegasus-xsum
returns the perfect results while google/pegasus-cnn_dailymail
returns very bad performance.
I am running the code from transformers/examples/legacy/seq2seq at master · huggingface/transformers · GitHub, and my transformers’ version is 4.2.0. The detailed implementation is:
python -m torch.distributed.launch --nproc_per_node=3 run_distributed_eval.py \
--model_name google/pegasus-cnn_dailymail \
--save_dir $OUTPUT_DIR \
--data_dir $DATA_DIR \
--bs 32 \
--fp16
The performance from google/pegasus-xsum
:
{'rouge1': 47.0271, 'rouge2': 24.4924, 'rougeL': 39.2529}
The performance from google/pegasus-cnn_dailymail
:
{'rouge1': 0.1602, 'rouge2': 0.084, 'rougeL': 0.1134}
I also check the test_generations.txt
file to try to figure out the reasons for abnormal low ROUGE score. It turns out most of the lines are blank. Does anyone meet similar problems when trying to generate summaries using google/pegasus-cnn_dailymail
?