Resize_token_embeddings for performance

captainjtx · February 25, 2025, 11:47pm

I read the blogpost Initializing New Word Embeddings for Pretrained Language Models · John Hewitt and understand the idea of resize with mean initialization for finetuning. However, if it’s just for performance purposes, and we just do inference with original checkpoint untouched, is resize with mean init enough ? How does transformers guarantee that padding positions are not sampled ?

Topic		Replies	Views
Importance of padding for tokens and same size inputs for transformers 🤗Transformers	1	680	October 22, 2021
Add new tokens and learn the embeddings of the new tokens and keeping all the other parametes frozen 🤗Tokenizers	0	466	April 30, 2021
The (hidden) meaning behind the embedding of the padding token? Awesome paper	2	6289	July 14, 2021
Vocabulary Expansion + Finetuning 🤗Transformers	0	288	January 17, 2024
Quora Duplicate Questions Multi-Task Learning Beginners	0	106	May 6, 2024

Resize_token_embeddings for performance

Related topics