Hi,
for some models there is this tie_word_embeddings
parameter. I think it is for the text 2 text models.
Can someone please explain what exactly this parameter is doing?
Many thanks
Philip
Hi,
for some models there is this tie_word_embeddings
parameter. I think it is for the text 2 text models.
Can someone please explain what exactly this parameter is doing?
Many thanks
Philip
No this is for all models that have a language modeling head (so even masked language models like BERT or causal language models like GPT-2). The idea is that the embedding weights (vocab_size by hidden_size) are tied with the decoder (hidden_size by vocab_size) so the model only learns one representation of the words (that is a big matrix!)