Question about last_hidden_state of the bert model

vincentyang123 · December 7, 2023, 2:05am

hi, everyone~

bert_config = AutoConfig.from_pretrained('bert-large-uncased')
self.bert = BertEncoder(bert_config)
sequence_output = self.bert(embedding_output).last_hidden_state

After training bert from the beginning, I find through debug that sequence_output (shape:[batchsize, seq_len, hidden_size]) has the same word vector like:

tensor([[[ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         ...,
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338]],

        [[ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         ...,
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338]],

        [[ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         ...,
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338]],

        ...,

        [[ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         ...,
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338]],

        [[ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         ...,
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338]],

        [[ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         ...,
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338],
         [ 0.4257, -0.5848,  1.6611,  ...,  0.5865, -0.3086,  1.7338]]],
       device='cuda:0')

I would like to know what might be the cause of this problem, thank you very much!

Topic		Replies	Views
For tuning a classifier head on a pretrained BERT should I use `last_hidden_state` or `outputs[0][:, 0, :]` from the BERT? Beginners	0	179	February 15, 2024
BertForSequenceClassification: Can I get the last hidden state? Beginners	0	742	January 9, 2023
How to get embedding matrix of bert in hugging face Beginners	8	41086	October 31, 2024
Is last_hidden_state the output of Encoder block? Beginners	1	446	December 23, 2021
What should be used as sentence embedding for BertModel? Beginners	0	1909	May 24, 2021

Question about last_hidden_state of the bert model

Related topics