Intermediate Fine-tuning vs Domain Adaptive Pretraining vs Task Adaptive Pretraining

I want to apply transfer learning to classify citizen reviews about government services into their relevant government sectors.

I have around 4K labeled citizen reviews as target data and around 30K newspaper articles that are labeled with labels similar to the target data, like healthcare, education, etc. In this case I have four options and I want an advice about the best approach to follow

  1. Apply intermediate fine tuning using a BERT model on the newspaper articles before fine tuning on the target data?

  2. Apply domain adaptive pretraining on the newspaper articles without the labels through further pretraining BERT model, then fine tune the model on the target data?

  3. Apply Task adaptive pretraining on a portion of the citizen reviews without the labels through further pretraining BERT. Then, fine tune the model using another portion of the target data with its labels.

  4. Combine approaches 2 and 3.

P.S. I will compare the chosen approach to direct fine tuning BERT on the target data.