What is the `tie_word_embeddings` option exactly doing?


for some models there is this tie_word_embeddings parameter. I think it is for the text 2 text models.
Can someone please explain what exactly this parameter is doing?

Many thanks

No this is for all models that have a language modeling head (so even masked language models like BERT or causal language models like GPT-2). The idea is that the embedding weights (vocab_size by hidden_size) are tied with the decoder (hidden_size by vocab_size) so the model only learns one representation of the words (that is a big matrix!)


Excuse me maybe I missunderstand something but according to this line:

what is the relationship between initializing the language modeling head and tying in this case?
I have just finish reading this issue after that during research found your discussion

Is Embedding*Embedding^{T} = I necessarily true-ish?