How to use the output of first several layers as the input of the last few layers in Bert/DistillBert

JB28666 · June 10, 2023, 4:20pm

Hello, I want to use the the output of first several layers as the input of the last few layers in Bert/Distillbert. For example, in Bert, I want to first get the output of the 6th layer, then use this output as input to a new modified Bert model which only has the last 6 layers of the original Bert, I found I can get the output embeddings of each layer, I am wondering if I have to convert it to the input_id and attention mask to feed it into my modified Bert model. Here is what I did:

# Load the first BERT model
model_pretrain = BertModel.from_pretrained('bert-base-uncased')


# Pass the input through the first BERT model
outputs = model_pretrain(input_ids, attention_mask)
hidden_states = outputs.hidden_states

# Get the output of the 6th layer from the first BERT model
layer_output = hidden_states[6]

#self-defined model including Bert
model = net()

# Remove unnecessary layers from BERT
num_removed_layers = 6  # Specify the number of layers to remove
encoder_layers = model.bert.encoder.layer[-num_removed_layers:]
model.bert.encoder.layer = nn.ModuleList(encoder_layers)

# Pass the sixth layer output as input to the second BERT model
outputs = model(inputs_embeds=layer_output)

Is this correct? Do I have to convert inputs_embeds to input_id and attention_mask, if so, how can I achieve it? Thanks!

Topic		Replies	Views
How to use encoded hidden_states as input to a Bert/DistilBert Model Beginners	0	335	June 19, 2023
How to modify the internal layers of BERT 🤗Transformers	12	16477	July 19, 2023
New Layer in BERT 🤗Transformers	0	199	September 25, 2022
How to use a custom embedding layer as input in get_encoder function 🤗Transformers	0	204	May 30, 2023
Modify BERT encoder layers? 🤗Transformers	0	1024	June 18, 2021

How to use the output of first several layers as the input of the last few layers in Bert/DistillBert

Related topics