Warning when loading T5 encoders

jhuang · April 8, 2021, 11:00pm

I am trying to load t5-large (same for t5-small and t5-base) encoder as a feature extractor. I use the following command

from transformers import T5EncoderModel
M=T5EncoderModel.from_pretrained('t5-large')

But I get warning msgs like this:

Some weights of the model checkpoint at t5-large were not used when initializing T5EncoderModel: 
['decoder.block.0.layer.0.SelfAttention.q.weight', ....,'decoder.final_layer_norm.weight']
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of T5EncoderModel were not initialized from the model checkpoint 
at t5-large and are newly initialized: ['encoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I can understand that the warning on decoder is because I only load encoder part. However, I am a bit worried about [‘encoder.embed_tokens.weight’]. Is the encoder (as it is) reliable for feature extraction?

btw, I also find that the parameter value of ‘encoder.embed_tokens.weight’ equals the look-up embedding layer, ‘shared.weight’.

torch.equal(M.state_dict()['encoder.embed_tokens.weight'], M.state_dict()['shared.weight']) == True

Anyone help me understand the issue?

arunwzd · May 8, 2022, 6:58am

any update on this?

arunwzd · June 8, 2022, 3:39pm

@sgugger can you help me with this?

Reza8848 · May 15, 2023, 3:45am

It is a normal phenomenon: 'encoder.embed_tokens.weight' will be initialized randomly (using self.shared) instead of pre-trained weights.

This warning was ignored in the newest version of the transformer.

And of course, it is harmless; you can feel free to ignore this warning.

Please refer to this comment for more details.

Topic		Replies	Views
Initializing T5Encoder model Models	2	2591	June 20, 2022
Load EncoderDecoderModel from a checkpoint Models	0	294	March 9, 2023
How to use the encoder only from T5? Beginners	0	673	April 9, 2022
Loading pytorch_pretrained_bert models with transformers Beginners	2	1902	April 29, 2021
No weights has been used to initialize the model 🤗Transformers	0	347	February 26, 2023

Warning when loading T5 encoders

Related topics