Should I use BertModel or BertModelForLM?

Hi, I would want to know what class should I use to extract word embeddings.

Given a sentence, I just want extract contextualised word embeddings.
However, I don’t really get how can I do it. In particular, I want to sum up the last four hidden layers of BERT. What’s the difference between hidden_states from BertForMaskedLM and hidden_states from BertModel? Thank you in advance.

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForMaskedLM.from_pretrained("bert-base-uncased")
inputs = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] 
outputs = model(**inputs, output_hidden_states=True)


tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] 
outputs = model(**inputs, output_hidden_states=True)

Both will return the same thing. BertForMaskedLM just adds a language modeling head on top of BertModel. The hidden states (which are the outputs of each layer of the Transformer encoder + the initial embeddings) are identical. As you don’t need to perform masked language modeling, you can just use BertModel, like so:

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] 
outputs = model(**inputs, output_hidden_states=True)
contextualized_vector = outputs.last_hidden_state

This contextualized vector contains the last hidden states of the tokens of your input sentence. It’s a tensor of shape (batch_size, number of tokens, hidden size).

Ok, thank you. What about BertConfig? You’re right. However, if I use BertConfig the outputs are different

from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
config = AutoConfig.from_pretrained('bert-base-uncased', output_hidden_states=True)
model = AutoModelForMaskedLM.from_config(config)
print(model(**tokenizer(sent, return_tensors="pt"), output_hidden_states=True).hidden_states)