Hi,
I wanted to use DeBERTa.
Somehow the preview of its unmasking abilities seems very bad.
I looked at the source code and cannot see the addition of the absolute positions.
Can someone explain me why the model performs so bad at MLM preview.
Maybe I overlooked the addition of the absolute positions in the source code.
An explanation of the implementation would be really helpful aswell!
Thank you!
Stephan