Transformers BERT QA Task: run_qa.py vs run_qa_no_trainer.py

I am going to use transformers to prune a BERT model for downstream task squadv1.1. And there are two scripts in the examples of transformers. (run_qa.py and run_qa_no_trainer.py)

Because I have to add some additional code to the training process, I choose the script without the trainer API.

However, I can’t reproduce the squadv1 result with this script. Can anyone provide a suite of hyperparameters that can help me to train a BERT model with run_qa_no_trainer.py that can generate an f1 score of 88 in the Squad v.1 task?

(I have 4 super 2020 Ti GPU)

Or is there any optimization for run_qa.py? should I transfer to run_qa.py?

Thank you very much.