Hi, I’m trying to finetune the facebook/mbart-large-50-many-to-many-mmt model for machine translation. Unfortunately, I keep maxing out my GPU memory and even with a batch size of 1 sample with gradient accumulation I cannot get it to work.
I was looking through potential solutions and came across this thread where pruning the embeddings has been suggested as a solution. @sshleifer created an issue for the same here and here, but I don’t think it saw any progress.
I’m trying to do this by myself right now, and was wondering if my approach was correct -
- Run tokenizer on dataset and get a vocabulary of all unique tokens
- Copy all the embeddings associated with the vocabulary and create a new embedding matrix
- Replace the embedding matrix in the model with the new one
- Map the old vocabulary to their corresponding indices on the new embedding matrix
- Run tokenizer again but remap tokens to new embedding matrix before passing them to the model
Does anyone here have any idea if this could work?