Trainer's `save_model` isn't saving the entire state_dict and is only saving the embedding/encoder

seanswyi · January 2, 2024, 4:12am

I currently have a very basic model that consist of a pre-trained backbone model with a MLP head. Think something like:

class Model(PreTrainedModel):
    def __init__(self, config):
        self.backbone = AutoModel.from_pretrained(config.model_name_or_path)
        self.mlp = MLPLayer()

    ...

I’m currently using the Trainer object to save my model. Specifically, the code that I’m using has a self.save_model(output_dir) inside the Training loop. This saves the checkpoint as a safetensors object.

When I try to load it using model = AutoModel.from_pretrained(PATH_TO_SAFETENSORS) I’m noticing that the keys for the MLP layer are not there and only the keys for the backbone model’s embedding and encoder layers are there.

I took a look at the source code for save_model, which seems to be using the _save method, and don’t see any reason why the MLP layers shouldn’t be saved. Both the _save and save_pretrained methods use the state_dict which contains the MLP layer’s weights.

Is there anything that I may be missing or may have configured incorrectly? Thanks.

seanswyi · January 2, 2024, 4:22am

Well the mistake was embarrassingly simple. It turns out that if I don’t initialize the exact model object, then only the keys that exist will be loaded.

That means that since I was doing model = AutoModel.from_pretrained(PATH_TO_SAFETENSORS) and not model = Model.from_pretrained(PATH_TO_SAFETENSORS) the code was (correctly) only loading the state_dict for the embedding and encoder layers, since my own MLP layer doesn’t exist in AutoModel.

I do think that a warning message of some type would be a bit helpful, but just leaving this here for other to find.

Topic		Replies	Views
If I use trainer.train() and then save the model, is that still useful? Beginners	4	2768	June 24, 2022
What is the purpose of save_pretrained()? Beginners	4	45764	August 12, 2021
Trainer.save_pretrained(modeldir) AttributeError: 'Trainer' object has no attribute 'save_pretrained' Beginners	3	4483	March 21, 2023
How to load model after running Trainer.save_model? Beginners	3	3139	November 28, 2023
Elegant way to load and save a pretrained model as part of other model? 🤗Transformers	0	850	June 9, 2022

Trainer's `save_model` isn't saving the entire state_dict and is only saving the embedding/encoder

Related topics