Does it make sense that continue training BERT by wikipedia corpus drop the GLUE score?

realliyifei · June 22, 2022, 6:39pm

I load the original weights via from_pretrained and continue training BERT via Trainer. The corpus I use is wikipedia; 20220301.en.

Then after continued trained, the performance of GLUE benchmark drops by more than 0.5. For instance, average RTE decreases from 62.45 to 61.49.

I’m wondering why? Since the original BERT is also trained on wikipedia (and book corpus). I expected that the average GLUE score would be nearly the same.

Or my training pipeline is wrong somewhere: continue-train-20220615-show-original-arch.ipynb - Google Drive

I’m working entirely on the huggingface API.

Topic		Replies	Views
Nlp course: Why I Fine-tune a model on the GLUE SST-2 dataset but get worse score compare to Bert(base) Course	0	534	July 27, 2023
Training BERT from scratch with Wikipedia + Book Corpus Dataset 🤗Transformers	1	4639	January 22, 2021
Run_glue.py provides higher GLUE score on bert-base-uncased 🤗Transformers	0	261	April 6, 2023
BertForSequenceClassification finetune training loss and accuracy have some problem 🤗Transformers	0	872	December 14, 2020
Tips for PreTraining BERT from scratch 🤗Transformers	19	9848	December 10, 2020

Does it make sense that continue training BERT by wikipedia corpus drop the GLUE score?

Related topics