Hugging Face Forums

MiniLMv2-L6-H384-distilled-from-RoBERTa-Large for continual pre-training?

mscham March 13, 2022, 4:17pm 1

Does anyone know if the MiniLMv2-L6-H384-distilled-from-xxx models are suitable for continual pre-training?
I see they are marked with the Fill-mask tag but their mask predictions on the model card page and when run locally seem to return gibberish.

I really like those models because they are so small and fast (and perform really well) but I’m wondering if I’d be better off switching to distilbert or something else if pretraining with in domain vocabulary is something I wanted to explore.

Topic		Replies	Views	Activity
Smaller RoBERTa model Beginners	1	822	July 10, 2020
DistilBert for Self-Supervision - switch heads for pre-training: MaskedLM and SequenceClassification Beginners	0	223	February 16, 2023
Finetune fill-mask network Models	0	368	June 20, 2022
Pretraining or Finetuning Beginners	1	138	October 6, 2024
Roberta Pre-training models being inconsistent across epochs Models	0	275	July 21, 2023