I am trying to load t5-large (same for t5-small and t5-base) encoder as a feature extractor. I use the following command
from transformers import T5EncoderModel M=T5EncoderModel.from_pretrained('t5-large')
But I get warning msgs like this:
Some weights of the model checkpoint at t5-large were not used when initializing T5EncoderModel: ['decoder.block.0.layer.0.SelfAttention.q.weight', ....,'decoder.final_layer_norm.weight'] - This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of T5EncoderModel were not initialized from the model checkpoint at t5-large and are newly initialized: ['encoder.embed_tokens.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
I can understand that the warning on decoder is because I only load encoder part. However, I am a bit worried about [‘encoder.embed_tokens.weight’]. Is the encoder (as it is) reliable for feature extraction?
btw, I also find that the parameter value of ‘encoder.embed_tokens.weight’ equals the look-up embedding layer, ‘shared.weight’.
torch.equal(M.state_dict()['encoder.embed_tokens.weight'], M.state_dict()['shared.weight']) == True
Anyone help me understand the issue?