Questions on the `BertModelLMHeadModel`

Hello,
HuggingFace Transformer documentation seem to point out that BertLMHeadModel can be used for causal language modeling(https://huggingface.co/transformers/model_doc/bert.html#bertmodellmheadmodel). If you look at the returned values from this model, it includes causalLMoutput. doesn’t the term “causal language modeling” refer to regular language modeling, as in the case for GPT-2? I am not so interested in the accuracy of the results, my intention is to examine the distribution of the attention weights.

Also, when providing “labels” for the causal language modeling with the BertLMHeadModel, can I just use labels = input_ids as in the case for GPT-2, for convinence?

Thank you,