hi @ahadda5 ,
is there config setting different?
setting config’s max length size or hidden layer dimension.
Also, if you want to build BART for Masked LM, add some last layers to predict hidden layer’s output to your output dimension.
for example, in BertForMaskedLM, class BertLMPredictionHead(nn.Module) fit dimension of hidden to output(vocab)
self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
you can get hint from BertForMaskedLM 's layer structure to build BART Masked LM .
hope to help.