Why TFBlenderbot SmallModel and TFBlenderbot SmallForConditionalGeneration are the same trainable_variables?

qiu · October 8, 2021, 11:19am

i downloaded model from facebook / blenderbot_small-90M , and loaded it with BlenderbotSmallTokenizer.from_pretrained() and BlenderbotSmallForConditionalGeneration.from_pretrained() respectively. When i looked into trainable_variables using:

for v in model.trainable_variables():
	print(v)

i found they were equal, but doc says there is a a language modeling head in TFBlenderbotSmallForConditionalGeneration, how can i get the weights of the head?

nielsr · October 8, 2021, 12:55pm

It is possible that the model uses tied embeddings, meaning the same embedding layer is used at the input and output of the model.

A lot of causal Transformer decoders use tied embeddings.

qiu · October 9, 2021, 3:46am

Thank you for your reply that is helpful, but what make me confused is: if last head dense in TFBlenderbotSmallForConditionalGeneration uses the kernel which weights shared with embeddings, how about the bias? And another problem is how can i get all the variables of the model include trainable and untrainable variables? Thank you again.

nielsr · October 11, 2021, 8:47am

if last head dense in TFBlenderbotSmallForConditionalGeneration uses the kernel which weights shared with embeddings, how about the bias?

Checking the PyTorch implementation, it seems that the language modeling head doesn’t use a bias, as seen here.

how can i get all the variables of the model include trainable and untrainable variables?

In PyTorch, you can get all parameters of a model as follows:

for name, param in model.named_parameters():
     print(name, param.shape)

cc’ing @Rocketknight1 for how to do this in Tensorflow.

qiu · October 19, 2021, 8:32am

thank you a lot, that’s great helpful！

Topic		Replies	Views
Bug in BartForConditionalGeneration's intialisation of lm_head 🤗Transformers	0	263	October 16, 2021
How to understand the bias term in language model head (when we tie the word embeddings)? 🤗Transformers	0	923	September 12, 2022
What is the `tie_word_embeddings` option exactly doing? 🤗Transformers	3	12646	October 15, 2022
How to train TFT5ForConditionalGeneration model? 🤗Transformers	5	3329	November 21, 2020
How is T5 pretrained? 🤗Transformers	3	510	July 12, 2021

Why TFBlenderbot SmallModel and TFBlenderbot SmallForConditionalGeneration are the same trainable_variables?

Related topics