Unable to load checkpoint after finetuning

Hi,
So I’m running into trouble when trying to load the model weights after finetuning a pretrained base model, steps I’m taking:

model= BertForSequenceClassification.from_pretrained("model/repo")

model.classifier = torch.nn.Sequential(
    torch.nn.Dropout(0.4),
    torch.nn.Linear(model.config.hidden_size, 256),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.4),
    torch.nn.Linear(256, 2),
)

getting a warning here which I guess is fine for the moment:

Some weights of the model checkpoint at model/repo were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at model/repo and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

But after finetuning using hyperparameter search :

trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=optuna_hp_space,
    n_trials=10,
)

and trying to load the model with the following config I’m still getting the same waring as before:

from transformers import AutoConfig, AutoModel

model = AutoModel.from_pretrained("./run-3/checkpoint-1880",config=model.config)

Is there something I’m missing?
Thanks for looking into this.

I have the same issue. I try to resume my model training and I get a similar behavior. I don’t see why only classification head weights aren’t saved while all other weights are saved.
There were missing keys in the checkpoint model loaded: [‘cls.predictions.decoder.weight’,‘cls.predictions.decoder.bias’].

I have the same issue, when resuming from a checkpoint, I get this warning: There were missing keys in the checkpoint model loaded: ['lm_head.weight'].. I am not sure what is the cause, but trying to figure it out.

2 Likes

I solved the issue on my end by setting save_safetensors=False, in the TrainingArguments.

When using save_safetensors=False, it throws errors while loading the prior checkpoints as it cannot detect the pytorch_model.bin file. Any suggestion to restore?

I also did this and it worked. Wonder what’s the problem with model.safetensors, cause when I check the model.state_dict of safetensors, the lm_head.weight was present.