How to train new token embedding to add to a pretrain model?

Skylixia · January 6, 2021, 10:20am

Hello,

I would like to take a pretrained model and only train new embeddings on a corpus, leaving the rest of the transformer untouched. Then, fine tuning on a task without changing the original embedding. Finally, swapping the embedding. All in all, how can I have control over only training the embeddings, leaving the embeddings untouched in training and swapping the embeddings of a model with the Hugging Face Transformer library ?
This is to follow the following approach taken in this article:

Pre-train a monolingual BERT (i.e. a transformer) in L1 with masked language modeling
(MLM) and next sentence prediction (NSP)
objectives on an unlabeled L1 corpus.
Transfer the model to a new language by learning new token embeddings while freezing the
transformer body with the same training objectives (MLM and NSP) on an unlabeled L2
corpus.
Fine-tune the transformer for a downstream
task using labeled data in L1, while keeping
the L1 token embeddings frozen.
Zero-shot transfer the resulting model to L2
by swapping the L1 token embeddings with
the L2 embeddings learned in Step 2.
Thank you !

BramVanroy · January 6, 2021, 12:33pm

Well, you answered your own question. You can freeze layers in PyTorch by setting requires_grad=False to a layer’s parameters. They will not be updated during training. You can then load the model, swap out the weights of the embedding layer with other learnt weights and save the model again (In transformers you can use model.save_pretrained()).

I am not sure how much help you need. If you need a step-by-step guide, I fear I do not have the time to help with that. The above should you help you a bit.

Topic		Replies	Views
Add new tokens and learn the embeddings of the new tokens and keeping all the other parametes frozen 🤗Tokenizers	0	480	April 30, 2021
Gradual Unfreezing support for Fine tuning models 🤗Transformers	3	4005	August 26, 2020
Finetune only certain embeddings 🤗Transformers	0	17	July 19, 2024
Soft prompt learning for BERT and GPT using Transformers 🤗Transformers	4	3845	July 31, 2023
Freezing first N layers of a transformer model 🤗Transformers	0	946	August 5, 2022

How to train new token embedding to add to a pretrain model?

Related topics