Fine-tune BERT for Masked Language Modeling


I have used a pre-trained BERT model using Hugging Transformers for a project. I would like to know how to “fine-tune” the BERT for Masked Language Modeling for a task like spelling correction. The links “” and “” are not found which seemed to be of great resource. As well as I would also like to know the dataset (like what kind of inputs and labels are to be given to the model) format that BERTForMaskedLM requires to be trained on. I would be grateful if anyone could help me in this regard.


1 Like

Interested in this too…

1 Like

it seems “lm_finetunin” script is not active.
there is this:

data_text like this:
traindata = {
‘text’: [
when make MLM model train data,mask traindata as the model input, origin traindata as the label.
for example:
input=‘我们[MASK]天出去玩吧’, //mask position is random.