I am trying to load t5-large (same for t5-small and t5-base) encoder as a feature extractor. I use the following command
from transformers import T5EncoderModel
M=T5EncoderModel.from_pretrained('t5-large')
But I get warning msgs like this:
Some weights of the model checkpoint at t5-large were not used when initializing T5EncoderModel:
['decoder.block.0.layer.0.SelfAttention.q.weight', ....,'decoder.final_layer_norm.weight']
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of T5EncoderModel were not initialized from the model checkpoint
at t5-large and are newly initialized: ['encoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
I can understand that the warning on decoder is because I only load encoder part. However, I am a bit worried about [‘encoder.embed_tokens.weight’]. Is the encoder (as it is) reliable for feature extraction?
btw, I also find that the parameter value of ‘encoder.embed_tokens.weight’ equals the look-up embedding layer, ‘shared.weight’.
torch.equal(M.state_dict()['encoder.embed_tokens.weight'], M.state_dict()['shared.weight']) == True
Anyone help me understand the issue?