Pruning a model embedding matrix for memory efficiency

Bookworm · March 17, 2022, 7:54am

Hi Aditya Srivastava,

Could you share your code for pruning the embedding matrix and lm heads?

The weights of the input embedding and lm head seem to be shared. I don’t know what’s the correct way to changing the weights while keeping this constraint.

import torch
from transformers import MT5ForConditionalGeneration

model = MT5ForConditionalGeneration.from_pretrained("google/mt5-base")
old_embedding = model.get_input_embeddings()
# ...select embeddings for some tokens
new_embedding = torch.nn.Embedding.from_pretrained(torch.rand(1000, 768))
model.set_input_embeddings(new_embedding)

print(model.lm_head.state_dict()["weight"].shape)
# Expect: [1000, 768]  Actual: [250112, 768]

Topic		Replies	Views
mBART embedding matrix prunning Intermediate	0	533	May 11, 2021
Tiny mBART doc/info 🤗Transformers	14	2212	August 7, 2020
How to train new token embedding to add to a pretrain model? 🤗Transformers	1	3677	January 6, 2021
mBART finetuning tips/post-mortem 🤗Transformers	6	2667	November 17, 2020
How to apply pruning on a BERT model? Beginners	5	3382	October 21, 2020

Pruning a model embedding matrix for memory efficiency

Related topics