I currently have a very basic model that consist of a pre-trained backbone model with a MLP head. Think something like:
class Model(PreTrainedModel):
def __init__(self, config):
self.backbone = AutoModel.from_pretrained(config.model_name_or_path)
self.mlp = MLPLayer()
...
I’m currently using the Trainer object to save my model. Specifically, the code that I’m using has a self.save_model(output_dir)
inside the Training loop. This saves the checkpoint as a safetensors object.
When I try to load it using model = AutoModel.from_pretrained(PATH_TO_SAFETENSORS)
I’m noticing that the keys for the MLP layer are not there and only the keys for the backbone model’s embedding and encoder layers are there.
I took a look at the source code for save_model
, which seems to be using the _save
method, and don’t see any reason why the MLP layers shouldn’t be saved. Both the _save
and save_pretrained
methods use the state_dict
which contains the MLP layer’s weights.
Is there anything that I may be missing or may have configured incorrectly? Thanks.