Hello, I’m trying to run this code:

tokenizer = DebertaTokenizer.from_pretrained(‘microsoft/deberta-base’)

model = AutoModelWithLMHead.from_pretrained(‘microsoft/deberta-base’)

and get this warning:

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: [‘lm_predictions.lm_head.dense.bias’, ‘lm_predictions.lm_head.LayerNorm.weight’, ‘lm_predictions.lm_head.dense.weight’, ‘lm_predictions.lm_head.bias’, ‘deberta.embeddings.position_embeddings.weight’, ‘lm_predictions.lm_head.LayerNorm.bias’]

- This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: [‘cls.predictions.transform.LayerNorm.bias’, ‘cls.predictions.transform.dense.weight’, ‘cls.predictions.transform.dense.bias’, ‘cls.predictions.transform.LayerNorm.weight’, ‘cls.predictions.bias’, ‘cls.predictions.decoder.weight’]

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

It seems that the checkpoints provided by microsoft/deberta-base doesn’t possess the weights needed for the MLM head, so that DebertaForMaskedLM cannot be directly used for masked token prediction.

Is this a bug of the DebertaForMaskedLM class? Or the checkpoints provided by Microsoft? Or I do not understand something?

Thank you!