Pruning a model embedding matrix for memory efficiency

IamAdiSri · April 19, 2021, 12:19pm

Okay, so I’ve worked everything out but the tokenizer. The model can be pruned and trained to perform quite well. Like I said above, I was getting extremely bad results, but it turns out that was due to my learning rate of 1e-5 being too high. I finally settled on a learning rate of 1e-8, and the model now actually converges. I feel that adding an lr scheduler with warmup, like on the fairseq models will resolve this issue, but I’m not sure how to do that with the Seq2SeqTrainer yet.

I still don’t know how to create a new tokenizer, but for the time being I’ve just defined a custom tokenizer that inherits from the main MBart50TokenizerFast class and adds a three functions - one to add the mapping of the old dictionary to the pruned dictionary, and two to encode and decode using this new dictionary. This may not be the “correct” way (by producing a new sentencepiece model), but works well enough in my opinion. I’m trying to figure that out but I have been unable to yet.

I would like to upload the pruned and finetuned model to the Model hub, but I’m unsure how that can be done without making a new sentencepiece model.

Topic		Replies	Views
mBART embedding matrix prunning Intermediate	0	527	May 11, 2021
Tiny mBART doc/info 🤗Transformers	14	2196	August 7, 2020
How to finetune MBART on an single language? Models	0	396	December 17, 2022
Train new Word Embedding for mBART Models	1	347	November 3, 2023
How to train new token embedding to add to a pretrain model? 🤗Transformers	1	3644	January 6, 2021

Pruning a model embedding matrix for memory efficiency

Related topics