Warning when loading T5 encoders

I am trying to load t5-large (same for t5-small and t5-base) encoder as a feature extractor. I use the following command

from transformers import T5EncoderModel
M=T5EncoderModel.from_pretrained('t5-large')

But I get warning msgs like this:

Some weights of the model checkpoint at t5-large were not used when initializing T5EncoderModel: 
['decoder.block.0.layer.0.SelfAttention.q.weight', ....,'decoder.final_layer_norm.weight']
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of T5EncoderModel were not initialized from the model checkpoint 
at t5-large and are newly initialized: ['encoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I can understand that the warning on decoder is because I only load encoder part. However, I am a bit worried about [‘encoder.embed_tokens.weight’]. Is the encoder (as it is) reliable for feature extraction?

btw, I also find that the parameter value of ‘encoder.embed_tokens.weight’ equals the look-up embedding layer, ‘shared.weight’.

torch.equal(M.state_dict()['encoder.embed_tokens.weight'], M.state_dict()['shared.weight']) == True

Anyone help me understand the issue?

1 Like

any update on this?

@sgugger can you help me with this?

It is a normal phenomenon: 'encoder.embed_tokens.weight' will be initialized randomly (using self.shared) instead of pre-trained weights.

This warning was ignored in the newest version of the transformer.

And of course, it is harmless; you can feel free to ignore this warning.

Please refer to this comment for more details.