Framework for Continual Pretraining

theotherkhan · August 16, 2023, 4:48pm

I am conducting continual pretraining / domain adaptive pretraining on bert-base-uncased using domain data in order to increase the accuracy of the model on domain specific tasks downstream. Currently, I am not seeing any improvements in those tasks after using a domain adapted model, even after fine-tuning. I want to confirm that I’m using Transformers correctly. Specifically:

Should I use BertForMaskedLM or BertForPreTraining as my starting model?
Should I train a tokenizer from scratch on the domain data or use a preexisting one?
Is having ~1M text records (each around a paragraph in length) roughly enough for continual pretraining?

Thanks!

Topic		Replies	Views
Continue pre-training Greek BERT with domain specific dataset 🤗Transformers	10	4660	January 4, 2023
Fine-tuning BERT Model on domain specific language and for classification 🤗Transformers	7	8444	November 14, 2024
Domain adaptation with MLM and NSP 🤗Transformers	3	1727	January 18, 2024
Continual pre-training vs. Fine-tuning a language model with MLM 🤗Transformers	5	8703	November 30, 2021
Why fine-tuning BERT mlm on specific domain doesn't work? What am I doing wrong? 🤗Transformers	2	1427	November 22, 2021

Framework for Continual Pretraining

Related topics