What is the `tie_word_embeddings` option exactly doing?


for some models there is this tie_word_embeddings parameter. I think it is for the text 2 text models.
Can someone please explain what exactly this parameter is doing?

Many thanks

No this is for all models that have a language modeling head (so even masked language models like BERT or causal language models like GPT-2). The idea is that the embedding weights (vocab_size by hidden_size) are tied with the decoder (hidden_size by vocab_size) so the model only learns one representation of the words (that is a big matrix!)