Questions on the `BertModelLMHeadModel`

h56cho · September 30, 2020, 5:36pm

Hello,
I have a few questions about the BertModelLMHeadModel:

Is BertModelLMHeadModel used to conduct the regular language modeling (next token prediction), as it is the case for the GPT2LMHeadModel?
For GPT2LMHeadModel, I can just specify labels = input_ids for convenience. I just specify the labels in this way for the BertModelLMHeadModel as well?

Thanks,

valhalla · October 2, 2020, 2:55pm

Hi @h56cho
do you mean the BertLMHeadModel ?

If yes then, it’s intended to be used with the EncoderDecoder model which allows you to use pre-trained encoder for as both encoder and decoder for seq2seq tasks. It’s not intended for language modeling.

While you can use that class as a standalone decoder by passing is_decoder=True to config it might not give you good results as it’s trained as an encoder.

h56cho · October 2, 2020, 4:42pm

Hello,
HuggingFace Transformer documentation seem to point out that BertLMHeadModel can be used for causal language modeling(https://huggingface.co/transformers/model_doc/bert.html#bertmodellmheadmodel). If you look at the returned values from this model, it includes causalLMoutput. doesn’t the term “causal language modeling” refer to regular language modeling, as in the case for GPT-2? I am not so interested in the accuracy of the results, my intention is to examine the distribution of the attention weights.

Also, when providing “labels” for the causal language modeling with the BertLMHeadModel, can I just use labels = input_ids as in the case for GPT-2, for convinence?

Thank you,

valhalla · October 2, 2020, 5:02pm

That’s what I said in the last comment,
It can be used as a standalone decoder (standalone decoder = causal LM).

Yes, you can pass labels = input_ids

h56cho · October 2, 2020, 5:03pm

Thank you!

h56cho · October 2, 2020, 8:27pm

Hello,
Sorry I have some additional question. This question is about the BertForMaskedLM model.
The documentation for BertForMaskedLM provides the following example to illustrate the model’s usage:

>>> from transformers import BertTokenizer, BertForMaskedLM
>>> import torch

>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
>>> input_ids = tokenizer("Hello, my dog is cute", return_tensors="pt")["input_ids"]

>>> outputs = model(input_ids, labels=input_ids)
>>> loss = outputs.loss
>>> prediction_logits = outputs.logits

In the example above, I don’t see any [MASK] token in the input; can the BertForMaskedLM model really be used with an input string that does not include [MASK] token? If I provide BertForMaskedLM model an input string that does not include the [MASK] token, from which token will the output of the model be produced from? In this case, would BertForMaskedLM automatically insert [MASK] token in the beginning of the input sequence?

Thank you again,

valhalla · October 5, 2020, 10:41am

That’s probably a mistake,

This might help.

sgugger · October 5, 2020, 1:36pm

Yes it’s definitely a mistake. Will fix this morning.

Topic		Replies	Views
Use BertLMHeadModel to finetunning a language model 🤗Transformers	0	324	March 30, 2021
Fine-tune BERT for Masked Language Modeling 🤗Transformers	3	3026	January 25, 2021
Pre - Train model with inputs_embeds 🤗Transformers	0	373	July 4, 2023
ELECTRA for Causal LM 🤗Transformers	0	497	April 8, 2021
Use of "input_ids,token_type_ids and lm_labels" in BERT Language model 🤗Transformers	1	1042	September 20, 2020

Questions on the `BertModelLMHeadModel`

Related topics