DebertaForMaskedLM cannot load the parameters in the MLM head from microsoft/deberta-base

Ecoding · February 10, 2022, 3:49pm

Hello, I’m trying to run this code:

tokenizer = DebertaTokenizer.from_pretrained(‘microsoft/deberta-base’)
model = AutoModelWithLMHead.from_pretrained(‘microsoft/deberta-base’)

and get this warning:

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: [‘lm_predictions.lm_head.dense.bias’, ‘lm_predictions.lm_head.LayerNorm.weight’, ‘lm_predictions.lm_head.dense.weight’, ‘lm_predictions.lm_head.bias’, ‘deberta.embeddings.position_embeddings.weight’, ‘lm_predictions.lm_head.LayerNorm.bias’]

This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: [‘cls.predictions.transform.LayerNorm.bias’, ‘cls.predictions.transform.dense.weight’, ‘cls.predictions.transform.dense.bias’, ‘cls.predictions.transform.LayerNorm.weight’, ‘cls.predictions.bias’, ‘cls.predictions.decoder.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

It seems that the checkpoints provided by microsoft/deberta-base doesn’t possess the weights needed for the MLM head, so that DebertaForMaskedLM cannot be directly used for masked token prediction.

Is this a bug of the DebertaForMaskedLM class? Or the checkpoints provided by Microsoft? Or I do not understand something?

Thank you!

Yarden · February 27, 2022, 4:39pm

Same issue for me.
help please?

Thanks!

kotalaraghav · April 29, 2022, 11:40am

I wouldn’t say it complete solution but it does initialize weights starting with “lm_predictions” checkout the code here

Thanks to @nbroad for his generous contribution.

from new_deberta import NewDebertaForMaskedLM
model = NewDebertaForMaskedLM.from_pretrained(model_name)

after this.

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing NewDebertaForMaskedLM: [‘deberta.embeddings.position_embeddings.weight’]

This IS expected if you are initializing NewDebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing NewDebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of NewDebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: [‘lm_predictions.lm_head.decoder.weight’, ‘lm_predictions.lm_head.decoder.bias’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I think we need some more hacks to initialize position_embedding weights too

Hope this helps!

nbroad · April 29, 2022, 12:46pm

There will be more changes coming soon to the main branch of transformers. The new changes will allow for the MLM head to produce reasonable suggestions for [MASK] tokens. Thanks for your patience!

Topic		Replies	Views
Loading the Mdeberta-v3-base Models	5	19	March 13, 2025
How to prevent Transformers from deleting task-head? Beginners	4	1191	July 29, 2022
DeBERTa-v3: How to keep ELECTRA-style task-head? Intermediate	5	2275	January 10, 2024
Warning when using ESM pre-trained model 🤗Transformers	2	1640	December 26, 2023
Why aren't all weights of BertForPreTraining initialized from the model checkpoint? Beginners	3	1588	October 5, 2021

DebertaForMaskedLM cannot load the parameters in the MLM head from microsoft/deberta-base

Related topics