DebertaForMaskedLM cannot load the parameters in the MLM head from microsoft/deberta-base

Hello, I’m trying to run this code:

tokenizer = DebertaTokenizer.from_pretrained(‘microsoft/deberta-base’)
model = AutoModelWithLMHead.from_pretrained(‘microsoft/deberta-base’)

and get this warning:

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: [‘lm_predictions.lm_head.dense.bias’, ‘lm_predictions.lm_head.LayerNorm.weight’, ‘lm_predictions.lm_head.dense.weight’, ‘lm_predictions.lm_head.bias’, ‘deberta.embeddings.position_embeddings.weight’, ‘lm_predictions.lm_head.LayerNorm.bias’]

  • This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: [‘cls.predictions.transform.LayerNorm.bias’, ‘cls.predictions.transform.dense.weight’, ‘cls.predictions.transform.dense.bias’, ‘cls.predictions.transform.LayerNorm.weight’, ‘cls.predictions.bias’, ‘cls.predictions.decoder.weight’]
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

It seems that the checkpoints provided by microsoft/deberta-base doesn’t possess the weights needed for the MLM head, so that DebertaForMaskedLM cannot be directly used for masked token prediction.

Is this a bug of the DebertaForMaskedLM class? Or the checkpoints provided by Microsoft? Or I do not understand something?

Thank you!

1 Like

Same issue for me.
help please?

Thanks!

I wouldn’t say it complete solution but it does initialize weights starting with “lm_predictions” checkout the code here

Thanks to @nbroad for his generous contribution.

from new_deberta import NewDebertaForMaskedLM
model = NewDebertaForMaskedLM.from_pretrained(model_name)

after this.

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing NewDebertaForMaskedLM: [‘deberta.embeddings.position_embeddings.weight’]

  • This IS expected if you are initializing NewDebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing NewDebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of NewDebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: [‘lm_predictions.lm_head.decoder.weight’, ‘lm_predictions.lm_head.decoder.bias’]
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I think we need some more hacks to initialize position_embedding weights too :smiley:

Hope this helps!

There will be more changes coming soon to the main branch of transformers. The new changes will allow for the MLM head to produce reasonable suggestions for [MASK] tokens. Thanks for your patience!

1 Like