Do we use pre-trained weights in Trainer?

MahdiA · January 7, 2022, 4:25pm

When we use Trainer to build a language model with MLM, based on which model we use (suppose DistilBERT), do we use the pre-trained weights in Trainer or weights are supposed to be updated from scrach?

nbroad · January 7, 2022, 10:06pm

You can do either – it depends on how you create your model. Trainer just handles the training aspect, not the model initialization.

# Model randomly initialized (starting from scratch)
config = AutoConfig.for_model("distilbert")
# Update config if you'd like
# config.update({"param": value})
model = AutoModelForMaskedLM.from_config(config)

# Model from a pre-trained checkpoint
model = AutoModelForMaskedLM.from_pretrained("distilbert-base-cased")

# Put model in Trainer
trainer = Trainer(model=model)

Unless you have a huge amount of data that is very different than what pre-trained models were trained on, I wouldn’t recommend starting from scratch.

Start from scratch when you are creating a model for a niche domain like a low-resource language.

Start from a pre-trained model if your text is in a high-resource language (like English) but the jargon might be very specific (like scientific texts). There are enough fundamental similarities that you’ll save compute and time by starting from a pre-trained model.

MahdiA · January 7, 2022, 10:24pm

True, I was going to do Sentiment Analysis over some text data, but whatever model that I tested, it over-fitted and I did not get any good result on validation data. So, I decided to train a DistilBERT model based on my own data, but I do not know whether the model start training with pre-trained weights or by random weights from scratch.
Thanks.

Topic		Replies	Views
DistilBert weights initialization Beginners	1	568	August 20, 2020
Trainer API weights initialization 🤗Transformers	2	58	February 10, 2025
Continual pre-training vs. Fine-tuning a language model with MLM 🤗Transformers	5	8685	November 30, 2021
Does it make sense to train DistilBERT from scratch in a new corpus Beginners	14	6629	April 4, 2023
Why aren't all weights of BertForPreTraining initialized from the model checkpoint? Beginners	3	1588	October 5, 2021

Do we use pre-trained weights in Trainer?

Related topics