I have a few questions about the
BertModelLMHeadModel used to conduct the regular language modeling (next token prediction), as it is the case for the
GPT2LMHeadModel, I can just specify
labels = input_ids for convenience. I just specify the
labels in this way for the
BertModelLMHeadModel as well?
do you mean the
If yes then, it’s intended to be used with the
EncoderDecoder model which allows you to use pre-trained encoder for as both encoder and decoder for seq2seq tasks. It’s not intended for language modeling.
While you can use that class as a standalone decoder by passing
config it might not give you good results as it’s trained as an encoder.
HuggingFace Transformer documentation seem to point out that
BertLMHeadModel can be used for causal language modeling(https://huggingface.co/transformers/model_doc/bert.html#bertmodellmheadmodel). If you look at the returned values from this model, it includes
causalLMoutput. doesn’t the term “causal language modeling” refer to regular language modeling, as in the case for GPT-2? I am not so interested in the accuracy of the results, my intention is to examine the distribution of the attention weights.
Also, when providing “labels” for the causal language modeling with the
BertLMHeadModel, can I just use
labels = input_ids as in the case for GPT-2, for convinence?
That’s what I said in the last comment,
It can be used as a standalone decoder (standalone decoder = causal LM).
Yes, you can pass
labels = input_ids
Sorry I have some additional question. This question is about the
The documentation for
BertForMaskedLM provides the following example to illustrate the model’s usage:
>>> from transformers import BertTokenizer, BertForMaskedLM
>>> import torch
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
>>> input_ids = tokenizer("Hello, my dog is cute", return_tensors="pt")["input_ids"]
>>> outputs = model(input_ids, labels=input_ids)
>>> loss = outputs.loss
>>> prediction_logits = outputs.logits
In the example above, I don’t see any
[MASK] token in the input; can the
BertForMaskedLM model really be used with an input string that does not include
[MASK] token? If I provide
BertForMaskedLM model an input string that does not include the
[MASK] token, from which token will the output of the model be produced from? In this case, would
BertForMaskedLM automatically insert
[MASK] token in the beginning of the input sequence?
Thank you again,
That’s probably a mistake,
This might help.
Yes it’s definitely a mistake. Will fix this morning.