How to modify the internal layers of BERT

imflash217 · October 5, 2020, 12:13pm

Hi everyone,
I am new to this huggingface. I have a new architecture that modifies the internal layers of the BERT Encoder and Decoder blocks. Though, I can create the whole new model from scratch but I want to use the already well written BERT architecture by HF.
How can I modify the layers in BERT src code to suit my demands.
Thanks a lot!

valhalla · October 5, 2020, 12:24pm

hi @imflash217

Could provide more details about what changes you want to make.
You can find the implementation here. It’s pretty easy to follow, you can take it and change it in any way you want.

imflash217 · November 13, 2020, 3:05am

I want to multiply the word-embeddings of BERT by some vector before passing it to the next operation in encoder/decoder.

How can I do that after I downloaded the pretrained BERT base?

Thanks

Jung · November 13, 2020, 4:15am

Hi , one easy way it can be done is by making a simple Class wrapper to :

extract embeded output
process with what you want
send it back to the body part of the architecture

I have a Kaggle-Tensorflow example (a bit older version) that applying exact same idea -->
making XLM-GPT2 by using embedding output from XLM-R and send it to GPT-2
so that the new GPT2 can handle multi-language (need finetuning in this case)

https://www.kaggle.com/ratthachat/jigsaw-gpt2-with-xlm-r-embedding

imflash217 · November 14, 2020, 12:32am

Thanks @Jung.
I am stuck at this point. I am attaching my code below.

class FC_Embeddings(nn.Module):
    def __init__(self, D_in, **kwargs):
        super(FC_Embeddings, self).__init__()
        self.D_in = D_in
        self.word_embed = nn.Embedding(30522, D_in)
        
    def forward(self, token_ids, *args, **kwargs):
        if kwargs is not None:
            fc_idxs = kwargs["fc_idxs"]
            print(fc_idxs)
        if fc_idxs is not None:
            out = self.word_embed(token_ids)
            fc_mags = torch.ones(1, 80, self.D_in)
            fc_mags[:, fc_idxs, :] *= 2
            out *= ones
        else:
            out = self.word_embed(token_ids)
        return out

%%time
import torch
import torch.nn as nn
from transformers import BertModel

# Create the BertClassfier class
class BertClassifier(nn.Module):
    """Bert Model for Classification Tasks.
    """
    def __init__(self, freeze_bert=False):
        super(BertClassifier, self).__init__()
        D_in, H, D_out = 768, 50, 5

        # Instantiate BERT model
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.bert.base_model.embeddings.word_embeddings = FC_Embeddings(D_in)

        # Instantiate an one-layer feed-forward classifier
        self.classifier = nn.Sequential(
            nn.Linear(D_in, H),
            nn.ReLU(),
            nn.Linear(H, D_out)
        )

        # Freeze the BERT model
        if freeze_bert:
            for param in self.bert.parameters():
                param.requires_grad = False
        
    def forward(self, input_ids, attention_mask, *args, **kwargs):
        # Feed input to BERT
        outputs = self.bert(input_ids=input_ids,
                            attention_mask=attention_mask, *args, **kwargs)
        # print(outputs.shape)
        # Extract the last hidden state of the token `[CLS]` for classification task
        last_hidden_state_cls = outputs[0][:, 0, :]

        # Feed input to classifier to compute logits
        logits = self.classifier(last_hidden_state_cls)

        return logits

When I call the modified BERT I am getting this error:

token_ids = preprocessing_for_bert([X[1111]])[0]
token_mask = preprocessing_for_bert([X[1111]])[1]

b = BertClassifier()
kwargs = {"fc_idxs": [1, 2, -1]}
out = b(input_ids=token_ids, attention_mask=token_mask, **kwargs)

ERROR:

TypeError                                 Traceback (most recent call last)
<ipython-input-138-aac1926d7204> in <module>
     12 b = BertClassifier()
     13 kwargs = {"fc_idxs": [1, 2, -1]}
---> 14 out = b(input_ids=token_ids, attention_mask=token_mask, **kwargs)
     15 
     16 out

~/anaconda3/envs/aogtr/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<timed exec> in forward(self, input_ids, attention_mask, *args, **kwargs)

~/anaconda3/envs/aogtr/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

TypeError: forward() got an unexpected keyword argument 'fc_idxs'

Any guidance would be very helpful!
Thanks

Jung · November 14, 2020, 2:25am

Hi, I am not a good coder on Pytorch, but I can give some rough ideas to fix the error.

First, if I understand your objective correctly, you should extract the pretrained embedding output (not redefine it with FC_Embeddings like you do). So you should send your input to Bert’s pretrained embedding layer. (send input_ids to get the embedded output, let named it x.)

Secondly, only here, that you can use your kwargs['fc_idxs'] to do what you want with x to get your designed output, let simply named it, y.

Then, after this point, you can send y to self.bert's upper layers (not include embdding layer) but not send kwargs['fc_idxs'] to self.bert since it doesn’t know this parameter.

NOTE: to send the embedded vector to self.bert's upper layers, you need to input inputs_embeds instead of input_ids.

Please see the manual for reference on inputs_embeds vs. input_ids :

And you can see my Tensorflow example doing exactly like this.

imflash217 · November 14, 2020, 2:58am

Yes, you are correct. I have just one doubt:
How could I find the layers of BERT in a sequential manner so that I can just index it and get all the layers after the embedding layer by doing bert.layers[1:].
But, there is no layer attribute like in your TF code.
Any ideas on this?

Because I could not find it, I tried to change the layer itself instead.

I can get the hidden states of the trained model. But how would I get the parts of the model itself so that I can train/finetune it as per my sub module above?

Thanks

Jung · November 14, 2020, 3:49am

Regarding your doubt, please see my last comment again. You can send inputs_embeds instead of input_ids to self.bert.

imflash217 · November 14, 2020, 4:51am

This works great. I can just create embeddings my way and call BERT with embeddings instead of ids…superb…Thanks a lot @Jung

calusbr · June 3, 2021, 11:01am

I’m having the same problem and would like to know if you have code that managed to change layers? It would help me a lot in my studies.

ShivaniSri · January 4, 2022, 10:19am

Can we add positional embeddings to custom embeddings

ShivaniSri · January 4, 2022, 10:25am

if we send custom embedding vector then we may loss the positional embeddings, segment embeddings which are input embeddings of bert model. Can we have combination of positional embeddings and custom embeddings together to send to the upper layers of bert

ankitbansal811 · July 19, 2023, 5:02pm

The inputs_embeds parameter is only used to replace the nn.Embedding layer. The position and type embedding are still added to the input embeddings

Topic		Replies	Views
Modify bert embeddings 🤗Transformers	0	380	January 18, 2022
New layer in bert embeddings 🤗Transformers	1	682	April 1, 2022
Modify BERT encoder layers? 🤗Transformers	0	1024	June 18, 2021
New Layer in BERT 🤗Transformers	0	199	September 25, 2022
How to add a new input layer to BERT / RoBERTa? Beginners	0	906	April 26, 2022

How to modify the internal layers of BERT

Related topics