Pre-trained DeBERTa - Weak MLM performance any hints?


I wanted to use DeBERTa.
Somehow the preview of its unmasking abilities seems very bad.

I looked at the source code and cannot see the addition of the absolute positions.

Can someone explain me why the model performs so bad at MLM preview.
Maybe I overlooked the addition of the absolute positions in the source code.
An explanation of the implementation would be really helpful aswell!

Thank you!

Is this deberta v3? The thing is that debertav3 is the discriminator trained with Replaced Token Detection, not MLM. Altought at somepoint they’ve added the MLM heads, in their work they didn’t mention anything like running tests on the discriminators with MLM tasks.

Basically, MLM should yield really bad result with the discriminator, like it is. You should download the generator model (the file pytoroch_model.generator.bin and generator_config.json on xsmall, large or mdeberta, it’s missing on base model) and MLM will run just fine.