Run_glue.py provides higher GLUE score on bert-base-uncased

According to transformers/README.md at main 路 huggingface/transformers 路 GitHub, I run run_glue.py for bert-base-uncased. I got similar number as in the README.md table. However, many tasks scores are much higher than the numbers reported in the Bert paper https://arxiv.org/pdf/1810.04805.pdf.
For example, Matthews corr of CoLA is 56.53 on README.md, and is 57.78 for my finetuned results using run_glue.py, but is only 52.1 in the Bert paper Table1.
Could any explain this? Do I miss someting? Thanks!