How to use encoded hidden_states as input to a Bert/DistilBert Model

JB28666 · June 19, 2023, 5:48pm

Hello,

I just want to use first 5 layers of distilBert as encoder (not tuning), and only tune the last layer (as code shown, I remove the first 5 layers which were used to generate the hidden states) plus my own model to save memory. Now I got the hidden states from the encoder. Is this a correct way to directly use hidden_state I got from the 5th layer as inputs_embeds for a Bert/DistillBert? Thanks!

hidden_states = torch.load('......pt') ###the output of the 5th layers of distilbert encoder
class Net(nn.Module):
     def forward(hidden_state, mask):
           distilbert_output = self.distilbert(inputs_embeds=hidden_state, attention_mask=mask, 
           return_dict=False)
           hidden_state = distilbert_output[0]                    
           pooled_output = hidden_state[:, 0] 
           x = pooled_output
           ......
           self defined classification layers

# Remove unnecessary layers from BERT
model = Net()
num_removed_layers = 1  # Specify the number of layers to remove
encoder_layers = model.distilbert.transformer.layer[-num_removed_layers:]
model.distilbert.transformer.layer = nn.ModuleList(encoder_layers)

Topic		Replies	Views
How to use the output of first several layers as the input of the last few layers in Bert/DistillBert 🤗Transformers	0	820	June 10, 2023
How to yield hidden_states from a saved, fine-tuned (distil)bert model? 🤗Transformers	2	401	July 12, 2020
BertForPretraining hidden_states extraction with input embeddings as inputs Models	0	397	June 4, 2022
Distilbert Seq2clas 🤗Transformers	4	404	July 19, 2021
MaskedLMOutput does not have last_hidden_state 🤗Transformers	0	1627	May 27, 2021

How to use encoded hidden_states as input to a Bert/DistilBert Model

Related topics