Set_input_embeddings() values not being saved with save_pretrained()

bivouac0 · December 26, 2023, 3:55am

I’m preloading the embeddings on an untrained gpt2 model. This works fine and the model trains well, but after saving and reloading, the model it doesn’t contain any of the inserted weights. Here’s some simplified code that doesn’t require training to demonstrate the issue…

#!/usr/bin/env python3
import torch
import numpy
from   transformers import AutoConfig, AutoModelForCausalLM

if __name__ == '__main__':
    model_name = 'gpt2' # 'bert-base-cased'
    # Build the untrained model from config
    model_config = AutoConfig.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_config(model_config)
    print('Original weights', model.get_input_embeddings().weight[0][:5])
    # Load embeddings to use in the model
    embed_weights = AutoModelForCausalLM.from_pretrained(model_name).get_input_embeddings().weight.detach().numpy()
    # embed_weights = numpy.load('data/embeddings/gpt2_input_embeddings.npz')['embed_weights']
    print('Loaded embed_weights', embed_weights[0][:5])
    embed_module  = torch.nn.Embedding(embed_weights.shape[0], embed_weights.shape[1],
                                       _weight=torch.from_numpy(embed_weights), _freeze=True)
    model.set_input_embeddings(embed_module)
    print('Modified embeds', model.get_input_embeddings().weight[0][:5])
    # Save the model
    save_directory = '/tmp/custom_model'
    model.save_pretrained(save_directory)
    # Reload the model
    model_reloaded = AutoModelForCausalLM.from_pretrained(save_directory)
    print('Reloaded embeds', model_reloaded.get_input_embeddings().weight[0][:5])

Here’s the results…

Original weights tensor([ 0.0099,  0.0235,  0.0178, -0.0249, -0.0010], grad_fn=<SliceBackward0>)
Loaded embed_weights [-0.11010301 -0.03926672  0.03310751  0.13382645 -0.04847569]
Modified embeds tensor([-0.1101, -0.0393,  0.0331,  0.1338, -0.0485])
Reloaded embeds tensor([ 0.0099,  0.0235,  0.0178, -0.0249, -0.0010], grad_fn=<SliceBackward0>)

As you can see, the model internally shows the modified weight values but after saving and reloading it’s back to the uninitialized values.

Note that I can manually re-load the initializing weights in my trained model and it works fine. This tells me the model I’m training with has those values correctly preloaded but I’m not sure why these values are not getting saved to disk.

BTW… After some testing, I see that the above code works in transformers 4.29 but fails in 4.30, so this looks like a bug in the library.

nielsr · December 26, 2023, 8:19am

Hi,

This is because GPT-2 uses tied weight embeddings, meaning that the weights of the input and output embeddings are shared by default:

from transformers import AutoConfig

config = AutoConfig.from_pretrained("gpt2")
print(config.tie_word_embeddings)

which prints True.

Hence if you want separate input- and output embedding matrices, make sure to also set it False:

config = AutoConfig.from_pretrained("gpt2", tie_word_embeddings=False)
model = AutoModelForCausalLM.from_config(model_config)

bivouac0 · December 26, 2023, 2:35pm

Thanks for the feedback. I didn’t consider the tied embeddings. I would think if they were tied, setting one would change the other as well but apparently that’s not the case.

Interesting that the behavior has changed in transformers versions. In 4.29 it looks like the values of the inputs are saved but in 4.36 it’s the values of the output that appear in the saved model, regardless of the change to the inputs.

I really want to keep them tied ~~so calling both set_input_embeddings(embed_module) and set_output_embeddings(embed_module) with the same module looks like the safe way to go.~~

bivouac0 · December 26, 2023, 3:23pm

As an edit to the above, it looks like the correct (and simplest) way to do this is…

model.set_input_embeddings(embed_module)
model.tie_weights()

The call to set_output_embeddings() requires a nn.Linear() layer, where set_input_embeddings() takes a nn.Embedding() module. Setting the inputs and then calling tie_weights() takes care of properly copying the weights from the input Embedding module to the output Linear layer.

Topic		Replies	Views
Trainer's `save_model` isn't saving the entire state_dict and is only saving the embedding/encoder Beginners	1	654	January 2, 2024
Using Trainer to save a Bartforsequenceclassification model Beginners	0	764	April 13, 2024
How to load a pretrained custom model using `from_pretrained` Beginners	4	4691	June 21, 2023
How to load model after running Trainer.save_model? Beginners	3	2457	November 28, 2023
If I use trainer.train() and then save the model, is that still useful? Beginners	4	1986	June 24, 2022

Set_input_embeddings() values not being saved with save_pretrained()

Related Topics