Does it make sense that continue training BERT by wikipedia corpus drop the GLUE score?

I load the original weights via from_pretrained and continue training BERT via Trainer. The corpus I use is wikipedia; 20220301.en.

Then after continued trained, the performance of GLUE benchmark drops by more than 0.5. For instance, average RTE decreases from 62.45 to 61.49.

I’m wondering why? Since the original BERT is also trained on wikipedia (and book corpus). I expected that the average GLUE score would be nearly the same.

Or my training pipeline is wrong somewhere: continue-train-20220615-show-original-arch.ipynb - Google Drive

I’m working entirely on the huggingface API.