Masked language model for BART (Not BERT)

hi @ahadda5 ,

is there config setting different?

setting config’s max length size or hidden layer dimension.

HF bart config docs

Also, if you want to build BART for Masked LM, add some last layers to predict hidden layer’s output to your output dimension.

for example, in BertForMaskedLM, class BertLMPredictionHead(nn.Module) fit dimension of hidden to output(vocab)

self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

you can get hint from BertForMaskedLM 's layer structure to build BART Masked LM .

hope to help.