Questions about loading checkpoint by `.from_pretrained`

I tried to use BERT -large-uncased to finetune on GLUE bechmark, and I loaded checkpoint from Hub. When I saw the details:

My understand is that the finetuning model for GLUE is BERT + classifier, the chechpoint is only for BERT. So missing keys is belong to classifer. But I think there is a pooler behind BERT attention, so pooler keys are also missing, right? (But why I didn’t see them in missing keys?) And I think the unexpected keys actually can be assigned to pooler.

And I also noticed that during the finetuning, not loading pooler’s weights (keeping the weights initialized normally) is better.:thinking:

1 Like

Seems basically expected behavior.

1 Like