When I load a BertForPretraining
with pretrained weights with
model_pretrain = BertForPreTraining.from_pretrained('bert-base-uncased')
I get the following warning:
Some weights of BertForPreTraining were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['cls.predictions.decoder.bias']
Why aren’t all the weights in cls.predictions
initialized from the saved checkpoint?
The model seems to produce reliable token prediction outputs (without further training). In particular, it produces the same outputs as a model loaded with
model_masked = BertForMaskedLM.from_pretrained('bert-base-uncased')
Here’s code verifying this in an example:
s = ("Pop superstar Shakira says she was the [MASK] of a random [MASK] by a [MASK] "
"of [MASK] boars while walking in a [MASK] in Barcelona with her eight-year-old "
"[MASK].")
inputs = tokenizer(s, return_tensors='pt')
outputs_pretrain = model_pretrain(**inputs)
outputs_masked = model_masked(**inputs)
assert torch.allclose(outputs_pretrain["prediction_logits"], outputs_masked["logits"])
Incidentally, when loading model_masked
, I don’t get a warning about newly initialized weights in cls.predictions
. All newly initialized weights are in cls.seq_relationship
, which is reasonable since if we only care about masked LM, the information from the base model regarding next sentence prediction can be safely thrown away.