State of the art technique for initializing Embedding Matrix?

RichardWang · July 19, 2020, 12:18pm

Based on init_weights of bert , bert normalize linear and embedding with mean 0 and 0.2 std.

BTW, I tried to use kaiming (Pytorch default initialization) on Linear and embedding, on my toy task with 2 layer transformer. And it gives slightly better performance. I won’t say it is better than xavier surely. But it is definitely worth trying.

Topic		Replies	Views
Can we resize embedding with embedding weighted initialized differently? 🤗Transformers	0	1361	August 18, 2020
Trainer API weights initialization 🤗Transformers	2	79	February 10, 2025
What is the `tie_word_embeddings` option exactly doing? 🤗Transformers	3	13315	October 15, 2022
Getting random results with BERT 🤗Transformers	3	915	April 27, 2021
Why is the lm_head layer in GPT2LMHeadModel not a parameter? Beginners	5	8138	September 29, 2023

State of the art technique for initializing Embedding Matrix?

Related topics