BertForMaskedLM model require fine-tuning?

quantumlight · August 7, 2022, 10:08pm

Hi, I am interested in the BertForMaskedLM model, and from the documentation it seems like it could predict the likelihood of a masked token directly from the pretrained BERT model?

However from looking at the code:

github.com

huggingface/transformers/blob/v4.21.1/src/transformers/models/bert/modeling_bert.py#L1307


      
          def __init__(self, config):
              super().__init__(config)
          
          
    if config.is_decoder:
                  logger.warning(
                      "If you want to use `BertForMaskedLM` make sure `config.is_decoder=False` for "
                      "bi-directional self-attention."
                  )
          
          
    self.bert = BertModel(config, add_pooling_layer=False)
              self.cls = BertOnlyMLMHead(config)
          
          
    # Initialize weights and apply final processing
              self.post_init()
          
          
def get_output_embeddings(self):
              return self.cls.predictions.decoder
          
          
def set_output_embeddings(self, new_embeddings):
              self.cls.predictions.decoder = new_embeddings

It seems that there is an extra LM head that projects a linear layer on top of the output hidden vectors which is then dot-producted with the vocabulary to produce the likelihood. I am wondering how the weights for this head is loaded from (as from_pretrained should only load the weights for the BERT encoder right?) or is it set to some default value each time (I noticed running the model gave the same value each time) and if using BERT in this way to predict the likelihood of a masked token requires pre-training or given the pre-training that BERT goes through would be unnecessary?

Topic		Replies	Views
Are the weights of the maskedLM head of the `BertForMaskedLM` model pre-trained? 🤗Transformers	0	418	October 19, 2020
Empty BERT Model, any help? Beginners	2	491	January 5, 2024
BertForMaskedLM training from scratch 🤗Transformers	0	1047	April 7, 2023
Why BertForMaskedLM has decoder layer 🤗Transformers	2	820	August 17, 2021
How to use AutoModel Beginners	0	1996	May 4, 2021

BertForMaskedLM model require fine-tuning?

Related topics