Do we use pre-trained weights in Trainer?

When we use Trainer to build a language model with MLM, based on which model we use (suppose DistilBERT), do we use the pre-trained weights in Trainer or weights are supposed to be updated from scrach?

You can do either – it depends on how you create your model. Trainer just handles the training aspect, not the model initialization.

# Model randomly initialized (starting from scratch)
config = AutoConfig.for_model("distilbert")
# Update config if you'd like
# config.update({"param": value})
model = AutoModelForMaskedLM.from_config(config)

# Model from a pre-trained checkpoint
model = AutoModelForMaskedLM.from_pretrained("distilbert-base-cased")

# Put model in Trainer
trainer = Trainer(model=model)

Unless you have a huge amount of data that is very different than what pre-trained models were trained on, I wouldn’t recommend starting from scratch.

Start from scratch when you are creating a model for a niche domain like a low-resource language.

Start from a pre-trained model if your text is in a high-resource language (like English) but the jargon might be very specific (like scientific texts). There are enough fundamental similarities that you’ll save compute and time by starting from a pre-trained model.

1 Like

True, I was going to do Sentiment Analysis over some text data, but whatever model that I tested, it over-fitted and I did not get any good result on validation data. So, I decided to train a DistilBERT model based on my own data, but I do not know whether the model start training with pre-trained weights or by random weights from scratch.