Suppose, I want to use a DistillBERT model for a new text corpus (say social media corpus - SMC) which is different from what originally BERT is trained on. There are two ways to train DistillBERT now:
Pre-train SMC-BERT from BERT checkpoint using the SMC data. Then train distillBERT with SMC-BERT as teacher model using the SMC corpus as train/valid/text corpus.
Pre-train DistillBERT directly from BERT as teacher model and the SMC corpus as train/valid/text corpus.
Is there a suggested approach out of these 2 ways to use distill-bert for a new corpus, the new corpus being different in textual style than the original BERT corpus?