Pretraining Models from Scratch vs Further Training

Ghada1997 · November 28, 2023, 9:30pm

Hi,

I want to pretrain an Arabic BERT model on domain-specific data to make it suitable for a specific domain problem, which is the classification of citizen reviews about government services into relevant government sectors. After extensive research, I found that domain-specific models outperform the general ones. So, my plan is to pretrain the model on freely available Arabic newspaper articles that specifically tackle the same sectors as the government ones, including education, healthcare, etc. I know these articles are not considered too specific to the target domain, but they are the only suitable data available. I plan to pretrain the model on around 20K articles only since I am limited with time and computational resources. Also, the target dataset contains about 2K citizen reviews provided in Modern Standard Arabic.

So, I have several questions concerning this project:

Would it be beneficial to pretrain the Arabic BERT model from scratch using this small dataset of 20K samples? or would it be too small to tackle my problem?
Would it be better to apply further pretraining for Arabic BERT model, which means starting with the model initial knowledge (weights) and then further pretraining it on the 20K samples? I am afraid this will lead to model forgetting for the previously learnt knowledge. Also, the combination of general and specific knowledge might affect the model performance on the target dataset of citizen reviews.
Whichever method I choose from above, should I pretrain the model on unlabeled data (unsupervised learning)? or is it better to train it on labeled data to be useful for text classification?
After pretraining the model, should I apply feature extraction or fine-tuning on the target dataset of citizen reviews?

Topic		Replies	Views
Domain Specific Pretraining using BERT models vs other smaller architecture models 🤗Transformers	0	210	December 7, 2023
Suitable Data for Task Adaptive Pretraining (TAPT) 🤗Transformers	0	195	December 4, 2023
Fine-tuning BERT Model on domain specific language Models	1	1798	January 5, 2021
Intermediate Fine-tuning vs Domain Adaptive Pretraining vs Task Adaptive Pretraining Beginners	0	398	December 8, 2023
Train from scratch vs further pretraining/fine tuning with MLM and NSP Research	1	1545	August 28, 2023

Pretraining Models from Scratch vs Further Training

Related topics