MLM vs CLM, can be exchanged?

mdelas · August 21, 2022, 6:02pm

Hello everyone, I have been building a BERT model with a Masked Language Model approach as my pretraining goal. All of a sudden I need to, instead of using a Masked Language Model, need the outputs of a Causal Language Model (predict the n+1 token of the sequence).

I was thinking if it would be correct to use this model and just the last token in the preprocessing and re-train my model. Do you think that’s a good approach? Or I should build again a model specific to an autoregressive behaviour…

Any references?

Thank you very much

Topic		Replies	Views
Is causal language modeling (CLM) vs masked language modeling (MLM) a common distinction in NLP research? Research	0	2181	April 21, 2021
Using BERT and RoBERTa for (causal?) language modeling 🤗Transformers	6	5337	October 2, 2021
Is masking still used when finetuning a BERT model? Beginners	1	1322	July 29, 2020
Which model can use to pre-train a BERT model? Beginners	1	462	December 22, 2021
Which BERT for Semantic Change (Methodology) Beginners	1	500	April 27, 2022

MLM vs CLM, can be exchanged?

Related topics