I have a number of questions regarding finetuning a language model:
How to mask a selective portion of a given input sentence instead of masking randomly.
For example, if I am using ALBERT as a model, and I am aiming to do a different kind of loss function than the standard MLM loss for the masked tokens, how to access the model output of the masked tokens