Is "Some weights of the model were not used" warning normal when pre-trained BERT only by MLM

kaansonmezoz · April 22, 2021, 12:53am

Hello guys,

I’ve trained BERT model from the scratch using BertForMaskedLM and trainers. When I use AutoModelForSequenceClassification to fine-tune my model for a text classification task, I get a warrning about weights initialization. Is it normal to get a warning such as in the below or am I doing something wrong ?

Some weights of the model checkpoint at ./cased/bert-wikidump-50mb-mlm/model were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at ./cased/bert-wikidump-50mb-mlm/model and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.weight', 'classifier.bias']

Loading pre-trained model with AutoModelForSequenceClassification

from transformers import AutoModelForSequenceClassification, AdamW, AutoConfig
config = AutoConfig.from_pretrained(PATHS["model"]["cased"]["local"], num_labels=df.category.unique().size)

model = AutoModelForSequenceClassification.from_pretrained(PATHS["model"]["cased"]["local"], config=config)

Code for training BERT from scratch with only MLM task

from transformers import BertConfig
config = BertConfig(vocab_size=64_000)

from transformers import BertForMaskedLM
model = BertForMaskedLM(config=config)

from transformers import Trainer, TrainingArguments

from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=bert_cased_tokenizer, mlm=True, mlm_probability=0.15
)

training_args = TrainingArguments(
    output_dir=PATHS["model"]["cased"]["training"]["local"],
    overwrite_output_dir=True,
    num_train_epochs=2,
    per_gpu_train_batch_size= 8, ## 512 max sequence lenght, 64 sequence count
    save_steps=10_000,
    save_total_limit=2,
    prediction_loss_only=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)

sgugger · April 22, 2021, 2:22am

Yes, the warning is telling you that some weights were randomly initialized (here you classification head), which is normal since you are instantiating a pretrained model for a different task. It’s there to remind you to finetune your model (it’s not usable for inference directly).

kaansonmezoz · April 22, 2021, 5:54am

Thanks @sgugger !

miOmiO · April 19, 2022, 7:12pm

Hi @sgugger , may I know how to suppress this information? I load pretrained model and multiprocess this , so this notification is visually overwhelming. Thank you!

alexneakameni · August 16, 2023, 6:32pm

Is there a way to avoid some weights to be initialized randomly since already initialized ?

jwagner · December 15, 2023, 4:26pm

To suppress this output, nlp - Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel - Stack Overflow suggests to change the verbosity level of transformers.logging.

bdzyubak · March 28, 2024, 4:22pm

Is there a way to use distilbert-base-uncased and other models out-of-the-box without fine tuning to benchmark?

Topic		Replies	Views
"Some weights were not used" message with AutoModel Beginners	4	1892	May 21, 2024
Weights not downloading Beginners	3	1833	May 24, 2021
Why aren't all weights of BertForPreTraining initialized from the model checkpoint? Beginners	3	1586	October 5, 2021
Uninitiallized weights with supposed correct architecture Models	1	329	October 6, 2023
Warning when using ESM pre-trained model 🤗Transformers	2	1602	December 26, 2023

Is "Some weights of the model were not used" warning normal when pre-trained BERT only by MLM

Related topics