Questions about loading checkpoint by `.from_pretrained`

Zoe0427 · September 29, 2025, 8:55pm

I tried to use BERT -large-uncased to finetune on GLUE bechmark, and I loaded checkpoint from Hub. When I saw the details:

My understand is that the finetuning model for GLUE is BERT + classifier, the chechpoint is only for BERT. So missing keys is belong to classifer. But I think there is a pooler behind BERT attention, so pooler keys are also missing, right? (But why I didn’t see them in missing keys?) And I think the unexpected keys actually can be assigned to pooler.

And I also noticed that during the finetuning, not loading pooler’s weights (keeping the weights initialized normally) is better.

John6666 · September 29, 2025, 11:54pm

Seems basically expected behavior.

Topic		Replies	Views
Loading pytorch_pretrained_bert models with transformers Beginners	2	1913	April 29, 2021
Unable to load checkpoint after finetuning Intermediate	5	4848	February 21, 2024
Missing keys when loading a model checkpoint (transformer) Models	0	1566	November 10, 2021
Is there a way to correctly load a pre-trained transformers model without the configuration file? Beginners	6	18107	August 13, 2021
How can I load specific checkpoint of trained model 🤗Transformers	0	620	April 28, 2022

Questions about loading checkpoint by `.from_pretrained`

Related topics