Calculating accuracy during fine-tuning the BERTForMaskedLM


While fine-tuning, we can only see loss and perplexity which is useful.
Is it also possible to see the accuracy of the model and also the tensorboard when using the “” script? It would be really helpful if anyone could explain how the “loss” is calculated for BERTForMaskedLM task (as there are no labels provided while fine-tuning).

To replicate the original training loss from the paper it should be calculated as “The training loss is the sum of the mean masked LM likelihood and the mean next sentence prediction likelihood.” You have more details in the BERT paper.

Thanks for your reply @vblagoje. Is it also possible to provide labels during fine-tuning of BERTForMaskedLM task? I was following this example.

I suspect you might be mixing up notions of BERT pre-training and BERT fine-tuning. BERT pre-training is used to train the BERT model itself which is then used for downstream tasks (that’s where it is fine-tuned). Very few people (researchers) are doing BERT pre-training and developers mostly use pre-trained models available on HF hub for their particular tasks. This is where the labels likely become relevant. In BERT pre-training there are no labels, it’s an unsupervised training task.

I am actually working on “spelling correction” task. For this task I have pre-trained the BERT model using masked language model. After pre-training, I want to fine-tune the model. We know that the dataset for spelling correction usually contains incorrect and its correct version in the file. So, how can I give the dataset which contains incorrect as well as correct versions while fine-tuning (how to give the labels? I am not understanding that part)? It would be grateful, if you help me in this regard.

Aha, I get it. For spelling correction task you likely need to start from token classification examples and take it from there.

@vblagoje Thank you for your reply.

Here is how I am dealing with task currently:

I am using BERT by masking the misspelled word to get predictions with their probability score. However, the results are not so good. So, I thought of fine-tuning the BERT.

I have checked the examples on token classification but I am not sure how the token classification will help me for my task. Could you please elaborate a bit more?