Freezing weights of new tokens in the input embedding

KhaiKit · August 30, 2023, 6:50am

Hi

I have added new tokens to my tokenizer and I would like to freeze the weights of the original set of tokens in the input embedding layer while allowing the weights of the new tokens to be trained. I’ve tried

existing_vocab = tokenizer.get_vocab()
for token_id in existing_vocab.values():
    if token_id < tokenizer.vocab_size - len(new_tokens):
        embedding = model.get_input_embeddings().weight[token_id]
        embedding = embedding.detach()
        embedding.requires_grad = False

But subsequently, when I check the weights of the new tokens, they are not frozen.

Any ideas how I can work around this?

gballoccuunica · December 20, 2023, 10:24am

Got the same problem while trying to freeze part of the parameters in the embedding layer of a pretrained CLM

Owos · September 25, 2024, 9:20am

@KhaiKit have you found a solution for this?

Topic		Replies	Views
Add new tokens and learn the embeddings of the new tokens and keeping all the other parametes frozen 🤗Tokenizers	0	466	April 30, 2021
How to freeze BERT weights Beginners	0	965	October 28, 2021
Working with named entities with bert Beginners	2	316	August 30, 2020
How to add a new token and assign corresponding weights for all layers for BERT model? Models	0	662	October 10, 2022
How to train new token embedding to add to a pretrain model? 🤗Transformers	1	3633	January 6, 2021

Freezing weights of new tokens in the input embedding

Related topics