Is masking still used when finetuning a BERT model?

tueboesen · July 29, 2020, 4:55pm

I’m trying to finetune a BERT model to do token classification, and I’m wondering exactly how the finetuning is done. When I was pretraining the model it was done by using Masked language model and masking out 15% as suggested in the original paper. But now when I’m finetuning the model, do I also mask out 15% of the input/target data or do I no longer do that and just train it on the unmasked data?

valhalla · July 29, 2020, 5:06pm

Hi @tueboesen, With BERT maksed language modelling is used as a pre-training task. For fine-tuning MLM is not used.

Topic		Replies	Views
Fine-tuning BERT with deterministic masking instead of random masking Beginners	0	163	April 22, 2024
How to do unsupervised fine-tuning? 🤗Transformers	1	6958	January 29, 2021
How can I see the masked words during pre-learning by MLM? 🤗Transformers	0	252	February 7, 2022
Fine tune Masked Language Model on custom dataset Beginners	5	6065	August 20, 2020
Fine tunning pretrained bert with new vocabulary Beginners	0	449	October 1, 2020

Is masking still used when finetuning a BERT model?

Related topics