What are your thoughts on the state-of-the-art technique for initializing Embedding Weight matrices? Currently, PyTorch uses normal distribution to initialize these. Does using Kaiming Init make more sense?

State of the art technique for initializing Embedding Matrix?

kushaj July 17, 2020, 9:46am 3

Transformer uses xavier.
So using Kaiming init for Embedding matrix is preferred for RNN? In case of transformer Xavier is preferred? Am I correct to say this?

Topic		Replies	Views
Can we resize embedding with embedding weighted initialized differently? 🤗Transformers	0	1361	August 18, 2020
Trainer API weights initialization 🤗Transformers	2	79	February 10, 2025
What is the `tie_word_embeddings` option exactly doing? 🤗Transformers	3	13314	October 15, 2022
Getting random results with BERT 🤗Transformers	3	915	April 27, 2021
Why is the lm_head layer in GPT2LMHeadModel not a parameter? Beginners	5	8138	September 29, 2023

State of the art technique for initializing Embedding Matrix?

Related topics