Pre-trained DeBERTa - Weak MLM performance any hints?

Mintbach · June 20, 2023, 7:34am

Hi,

I wanted to use DeBERTa.
Somehow the preview of its unmasking abilities seems very bad.

I looked at the source code and cannot see the addition of the absolute positions.

Can someone explain me why the model performs so bad at MLM preview.
Maybe I overlooked the addition of the absolute positions in the source code.
An explanation of the implementation would be really helpful aswell!

Thank you!
Stephan

tgsc · July 21, 2023, 12:28am

Is this deberta v3? The thing is that debertav3 is the discriminator trained with Replaced Token Detection, not MLM. Altought at somepoint they’ve added the MLM heads, in their work they didn’t mention anything like running tests on the discriminators with MLM tasks.

Basically, MLM should yield really bad result with the discriminator, like it is. You should download the generator model (the file pytoroch_model.generator.bin and generator_config.json on xsmall, large or mdeberta, it’s missing on base model) and MLM will run just fine.

Topic		Replies	Views
Pre-trained DeBERTa 🤗Transformers	0	208	June 14, 2023
DebertaForMaskedLM cannot load the parameters in the MLM head from microsoft/deberta-base Models	3	1324	April 29, 2022
DeBERTa-v3: How to keep ELECTRA-style task-head? Intermediate	5	2275	January 10, 2024
Fine-Tuning DeBERTa Produces Non-Results 🤗Transformers	3	3056	September 21, 2022
Deberta v3 Input length and Absolute positional embeddings Models	0	177	September 30, 2023

Pre-trained DeBERTa - Weak MLM performance any hints?

Related topics