So, I had success in getting this to work! I was able to prune the embedding matrix and lm heads to less than a tenth of their sizes. On testing a couple of samples in Hindi to English translation I saw no difference in the translations between the stock model’s inference and the pruned model’s inference.
Btw, I’m stuck at step 4 where I need to make a new Tokenizer for the vocabulary (subword token to index mapping) that I’ve generated. I’m trying to use MBart50TokenizerFast
for the same, and currently using dictionaries to map old indices to new indices. I’d really appreciate if you could point me in the right direction.