Domain Specific Pretraining using BERT models vs other smaller architecture models

Ghada1997 · December 7, 2023, 8:28pm

I have around 4K target data about Arabic citizen reviews towards government services and I want to apply transfer learning to enhance the performance of target task, which is classifying the reviews to the relevant government sectors, education, healthcare, etc. I plan to use a source data of 30-40K Arabic newspaper articles that specifically tackle the same sectors as the government ones, including education, healthcare, etc. I know these articles are not considered too specific to the target domain, but they are the only suitable data available.

My plan is to compare fine tuning an Arabic BERT model on the target data to training a model from scratch on the source data and then applying fine tuning on the target data.

So my question is should I apply Task Adaptive Pretraining (TAPT), more specifically, further pretraining the Arabic BERT model on task-specific data (which is the 30-40K newspaper articles), then fine tune on the target data, to compare the performance against fine tuning the Arabic BERT model directly on yhe target data without further training?

Or is it better to pretrain another smaller model architecture like (LSTM or may be lite BERT) from scratch on the source data to account for its small size. Then, fine tune the model on the target data of citizen reviews?

Also, is it fine to compare direct fine tuning with pretraining from scratch using different model architectures? Or should I use the same architecture to be able to perform a valid experiment?

Topic		Replies	Views
Suitable Data for Task Adaptive Pretraining (TAPT) 🤗Transformers	0	193	December 4, 2023
Pretraining Models from Scratch vs Further Training 🤗Transformers	0	269	November 28, 2023
Intermediate Fine-tuning vs Domain Adaptive Pretraining vs Task Adaptive Pretraining Beginners	0	395	December 8, 2023
Using EXTREMELY small dataset to finetune BERT 🤗Transformers	6	13097	February 1, 2023
Fine-tuning BERT Model on domain specific language Models	1	1797	January 5, 2021

Domain Specific Pretraining using BERT models vs other smaller architecture models

Related topics