How to modify the internal layers of BERT

Hi everyone,
I am new to this huggingface. I have a new architecture that modifies the internal layers of the BERT Encoder and Decoder blocks. Though, I can create the whole new model from scratch but I want to use the already well written BERT architecture by HF.
How can I modify the layers in BERT src code to suit my demands.
Thanks a lot!

hi @imflash217

Could provide more details about what changes you want to make.
You can find the implementation here. It’s pretty easy to follow, you can take it and change it in any way you want.

I want to multiply the word-embeddings of BERT by some vector before passing it to the next operation in encoder/decoder.

How can I do that after I downloaded the pretrained BERT base?


Hi , one easy way it can be done is by making a simple Class wrapper to :

  1. extract embeded output
  2. process with what you want
  3. send it back to the body part of the architecture

I have a Kaggle-Tensorflow example (a bit older version) that applying exact same idea -->
making XLM-GPT2 by using embedding output from XLM-R and send it to GPT-2
so that the new GPT2 can handle multi-language (need finetuning in this case)

1 Like

Thanks @Jung.
I am stuck at this point. I am attaching my code below.

class FC_Embeddings(nn.Module):
    def __init__(self, D_in, **kwargs):
        super(FC_Embeddings, self).__init__()
        self.D_in = D_in
        self.word_embed = nn.Embedding(30522, D_in)
    def forward(self, token_ids, *args, **kwargs):
        if kwargs is not None:
            fc_idxs = kwargs["fc_idxs"]
        if fc_idxs is not None:
            out = self.word_embed(token_ids)
            fc_mags = torch.ones(1, 80, self.D_in)
            fc_mags[:, fc_idxs, :] *= 2
            out *= ones
            out = self.word_embed(token_ids)
        return out
import torch
import torch.nn as nn
from transformers import BertModel

# Create the BertClassfier class
class BertClassifier(nn.Module):
    """Bert Model for Classification Tasks.
    def __init__(self, freeze_bert=False):
        super(BertClassifier, self).__init__()
        D_in, H, D_out = 768, 50, 5

        # Instantiate BERT model
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.bert.base_model.embeddings.word_embeddings = FC_Embeddings(D_in)

        # Instantiate an one-layer feed-forward classifier
        self.classifier = nn.Sequential(
            nn.Linear(D_in, H),
            nn.Linear(H, D_out)

        # Freeze the BERT model
        if freeze_bert:
            for param in self.bert.parameters():
                param.requires_grad = False
    def forward(self, input_ids, attention_mask, *args, **kwargs):
        # Feed input to BERT
        outputs = self.bert(input_ids=input_ids,
                            attention_mask=attention_mask, *args, **kwargs)
        # print(outputs.shape)
        # Extract the last hidden state of the token `[CLS]` for classification task
        last_hidden_state_cls = outputs[0][:, 0, :]

        # Feed input to classifier to compute logits
        logits = self.classifier(last_hidden_state_cls)

        return logits

When I call the modified BERT I am getting this error:

token_ids = preprocessing_for_bert([X[1111]])[0]
token_mask = preprocessing_for_bert([X[1111]])[1]

b = BertClassifier()
kwargs = {"fc_idxs": [1, 2, -1]}
out = b(input_ids=token_ids, attention_mask=token_mask, **kwargs)


TypeError                                 Traceback (most recent call last)
<ipython-input-138-aac1926d7204> in <module>
     12 b = BertClassifier()
     13 kwargs = {"fc_idxs": [1, 2, -1]}
---> 14 out = b(input_ids=token_ids, attention_mask=token_mask, **kwargs)
     16 out

~/anaconda3/envs/aogtr/lib/python3.8/site-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<timed exec> in forward(self, input_ids, attention_mask, *args, **kwargs)

~/anaconda3/envs/aogtr/lib/python3.8/site-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

TypeError: forward() got an unexpected keyword argument 'fc_idxs'

Any guidance would be very helpful!

Hi, I am not a good coder on Pytorch, but I can give some rough ideas to fix the error.

First, if I understand your objective correctly, you should extract the pretrained embedding output (not redefine it with FC_Embeddings like you do). So you should send your input to Bert’s pretrained embedding layer. (send input_ids to get the embedded output, let named it x.)

Secondly, only here, that you can use your kwargs['fc_idxs'] to do what you want with x to get your designed output, let simply named it, y.

Then, after this point, you can send y to self.bert's upper layers (not include embdding layer) but not send kwargs['fc_idxs'] to self.bert since it doesn’t know this parameter.

NOTE: to send the embedded vector to self.bert's upper layers, you need to input inputs_embeds instead of input_ids.

Please see the manual for reference on inputs_embeds vs. input_ids :

And you can see my Tensorflow example doing exactly like this.
1 Like

Yes, you are correct. I have just one doubt:
How could I find the layers of BERT in a sequential manner so that I can just index it and get all the layers after the embedding layer by doing bert.layers[1:].
But, there is no layer attribute like in your TF code.
Any ideas on this?

Because I could not find it, I tried to change the layer itself instead.

I can get the hidden states of the trained model. But how would I get the parts of the model itself so that I can train/finetune it as per my sub module above?


Regarding your doubt, please see my last comment again. You can send inputs_embeds instead of input_ids to self.bert.

This works great. I can just create embeddings my way and call BERT with embeddings instead of ids…superb…Thanks a lot @Jung :slight_smile:

1 Like

I’m having the same problem and would like to know if you have code that managed to change layers? It would help me a lot in my studies.

1 Like

Can we add positional embeddings to custom embeddings

if we send custom embedding vector then we may loss the positional embeddings, segment embeddings which are input embeddings of bert model. Can we have combination of positional embeddings and custom embeddings together to send to the upper layers of bert