Hi, I’m trying to finetune the facebook/mbart-large-50-many-to-many-mmt model for machine translation. Unfortunately, I keep maxing out my GPU memory and even with a batch size of 1 sample with gradient accumulation I cannot get it to work. I was looking through potential solutions and came across t…

@sshleifer Hi, its been a while. I actually managed to get everything working correctly, including the tokenizer. Seeing how many hits this post has gotten and how many people have reached out to me since, I recently converted my code into a Python library which is now hosted on PyPI and supports bo…

Pruning a model embedding matrix for memory efficiency

Intermediate

sshleifer April 14, 2021, 7:10pm 2

Yes this seems like the right approach.
When you get to step 4/5 you can just make a new Tokenizer.
If you get it working please post the solution here!

1 Like

Topic		Replies	Views
mBART embedding matrix prunning Intermediate	0	527	May 11, 2021
Tiny mBART doc/info 🤗Transformers	14	2195	August 7, 2020
How to finetune MBART on an single language? Models	0	396	December 17, 2022
Train new Word Embedding for mBART Models	1	347	November 3, 2023
How to train new token embedding to add to a pretrain model? 🤗Transformers	1	3642	January 6, 2021

Pruning a model embedding matrix for memory efficiency

Related topics