Questions on the `BertModelLMHeadModel`

h56cho · October 2, 2020, 4:42pm

Hello,
HuggingFace Transformer documentation seem to point out that BertLMHeadModel can be used for causal language modeling(https://huggingface.co/transformers/model_doc/bert.html#bertmodellmheadmodel). If you look at the returned values from this model, it includes causalLMoutput. doesn’t the term “causal language modeling” refer to regular language modeling, as in the case for GPT-2? I am not so interested in the accuracy of the results, my intention is to examine the distribution of the attention weights.

Also, when providing “labels” for the causal language modeling with the BertLMHeadModel, can I just use labels = input_ids as in the case for GPT-2, for convinence?

Thank you,

Topic		Replies	Views
Use BertLMHeadModel to finetunning a language model 🤗Transformers	0	328	March 30, 2021
Use of "input_ids,token_type_ids and lm_labels" in BERT Language model 🤗Transformers	1	1060	September 20, 2020
Fine-tune BERT for Masked Language Modeling 🤗Transformers	3	3051	January 25, 2021
ELECTRA for Causal LM 🤗Transformers	0	500	April 8, 2021
Where in the code does masking of tokens happen when pretraining BERT Beginners	5	7334	August 17, 2020

Questions on the `BertModelLMHeadModel`

Related topics