I’m preloading the embeddings on an untrained gpt2 model. This works fine and the model trains well, but after saving and reloading, the model it doesn’t contain any of the inserted weights. Here’s some simplified code that doesn’t require training to demonstrate the issue…
#!/usr/bin/env python3
import torch
import numpy
from transformers import AutoConfig, AutoModelForCausalLM
if __name__ == '__main__':
model_name = 'gpt2' # 'bert-base-cased'
# Build the untrained model from config
model_config = AutoConfig.from_pretrained(model_name)
model = AutoModelForCausalLM.from_config(model_config)
print('Original weights', model.get_input_embeddings().weight[0][:5])
# Load embeddings to use in the model
embed_weights = AutoModelForCausalLM.from_pretrained(model_name).get_input_embeddings().weight.detach().numpy()
# embed_weights = numpy.load('data/embeddings/gpt2_input_embeddings.npz')['embed_weights']
print('Loaded embed_weights', embed_weights[0][:5])
embed_module = torch.nn.Embedding(embed_weights.shape[0], embed_weights.shape[1],
_weight=torch.from_numpy(embed_weights), _freeze=True)
model.set_input_embeddings(embed_module)
print('Modified embeds', model.get_input_embeddings().weight[0][:5])
# Save the model
save_directory = '/tmp/custom_model'
model.save_pretrained(save_directory)
# Reload the model
model_reloaded = AutoModelForCausalLM.from_pretrained(save_directory)
print('Reloaded embeds', model_reloaded.get_input_embeddings().weight[0][:5])
Here’s the results…
Original weights tensor([ 0.0099, 0.0235, 0.0178, -0.0249, -0.0010], grad_fn=<SliceBackward0>)
Loaded embed_weights [-0.11010301 -0.03926672 0.03310751 0.13382645 -0.04847569]
Modified embeds tensor([-0.1101, -0.0393, 0.0331, 0.1338, -0.0485])
Reloaded embeds tensor([ 0.0099, 0.0235, 0.0178, -0.0249, -0.0010], grad_fn=<SliceBackward0>)
As you can see, the model internally shows the modified weight values but after saving and reloading it’s back to the uninitialized values.
Note that I can manually re-load the initializing weights in my trained model and it works fine. This tells me the model I’m training with has those values correctly preloaded but I’m not sure why these values are not getting saved to disk.
BTW… After some testing, I see that the above code works in transformers 4.29 but fails in 4.30, so this looks like a bug in the library.