Hello, I want to use the the output of first several layers as the input of the last few layers in Bert/Distillbert. For example, in Bert, I want to first get the output of the 6th layer, then use this output as input to a new modified Bert model which only has the last 6 layers of the original Bert, I found I can get the output embeddings of each layer, I am wondering if I have to convert it to the input_id and attention mask to feed it into my modified Bert model. Here is what I did:
# Load the first BERT model
model_pretrain = BertModel.from_pretrained('bert-base-uncased')
# Pass the input through the first BERT model
outputs = model_pretrain(input_ids, attention_mask)
hidden_states = outputs.hidden_states
# Get the output of the 6th layer from the first BERT model
layer_output = hidden_states[6]
#self-defined model including Bert
model = net()
# Remove unnecessary layers from BERT
num_removed_layers = 6 # Specify the number of layers to remove
encoder_layers = model.bert.encoder.layer[-num_removed_layers:]
model.bert.encoder.layer = nn.ModuleList(encoder_layers)
# Pass the sixth layer output as input to the second BERT model
outputs = model(inputs_embeds=layer_output)
Is this correct? Do I have to convert inputs_embeds to input_id and attention_mask, if so, how can I achieve it? Thanks!