Hi everyone,
I am new to this huggingface. I have a new architecture that modifies the internal layers of the BERT Encoder and Decoder blocks. Though, I can create the whole new model from scratch but I want to use the already well written BERT architecture by HF.
How can I modify the layers in BERT src code to suit my demands.
Thanks a lot!
hi @imflash217
Could provide more details about what changes you want to make.
You can find the implementation here. It’s pretty easy to follow, you can take it and change it in any way you want.
I want to multiply the word-embeddings of BERT by some vector before passing it to the next operation in encoder/decoder.
How can I do that after I downloaded the pretrained BERT base?
Thanks
Hi , one easy way it can be done is by making a simple Class wrapper to :
- extract embeded output
- process with what you want
- send it back to the body part of the architecture
I have a Kaggle-Tensorflow example (a bit older version) that applying exact same idea -->
making XLM-GPT2 by using embedding output from XLM-R and send it to GPT-2
so that the new GPT2 can handle multi-language (need finetuning in this case)
https://www.kaggle.com/ratthachat/jigsaw-gpt2-with-xlm-r-embedding
Thanks @Jung.
I am stuck at this point. I am attaching my code below.
class FC_Embeddings(nn.Module):
def __init__(self, D_in, **kwargs):
super(FC_Embeddings, self).__init__()
self.D_in = D_in
self.word_embed = nn.Embedding(30522, D_in)
def forward(self, token_ids, *args, **kwargs):
if kwargs is not None:
fc_idxs = kwargs["fc_idxs"]
print(fc_idxs)
if fc_idxs is not None:
out = self.word_embed(token_ids)
fc_mags = torch.ones(1, 80, self.D_in)
fc_mags[:, fc_idxs, :] *= 2
out *= ones
else:
out = self.word_embed(token_ids)
return out
%%time
import torch
import torch.nn as nn
from transformers import BertModel
# Create the BertClassfier class
class BertClassifier(nn.Module):
"""Bert Model for Classification Tasks.
"""
def __init__(self, freeze_bert=False):
super(BertClassifier, self).__init__()
D_in, H, D_out = 768, 50, 5
# Instantiate BERT model
self.bert = BertModel.from_pretrained('bert-base-uncased')
self.bert.base_model.embeddings.word_embeddings = FC_Embeddings(D_in)
# Instantiate an one-layer feed-forward classifier
self.classifier = nn.Sequential(
nn.Linear(D_in, H),
nn.ReLU(),
nn.Linear(H, D_out)
)
# Freeze the BERT model
if freeze_bert:
for param in self.bert.parameters():
param.requires_grad = False
def forward(self, input_ids, attention_mask, *args, **kwargs):
# Feed input to BERT
outputs = self.bert(input_ids=input_ids,
attention_mask=attention_mask, *args, **kwargs)
# print(outputs.shape)
# Extract the last hidden state of the token `[CLS]` for classification task
last_hidden_state_cls = outputs[0][:, 0, :]
# Feed input to classifier to compute logits
logits = self.classifier(last_hidden_state_cls)
return logits
When I call the modified BERT I am getting this error:
token_ids = preprocessing_for_bert([X[1111]])[0]
token_mask = preprocessing_for_bert([X[1111]])[1]
b = BertClassifier()
kwargs = {"fc_idxs": [1, 2, -1]}
out = b(input_ids=token_ids, attention_mask=token_mask, **kwargs)
ERROR:
TypeError Traceback (most recent call last)
<ipython-input-138-aac1926d7204> in <module>
12 b = BertClassifier()
13 kwargs = {"fc_idxs": [1, 2, -1]}
---> 14 out = b(input_ids=token_ids, attention_mask=token_mask, **kwargs)
15
16 out
~/anaconda3/envs/aogtr/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
<timed exec> in forward(self, input_ids, attention_mask, *args, **kwargs)
~/anaconda3/envs/aogtr/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
TypeError: forward() got an unexpected keyword argument 'fc_idxs'
Any guidance would be very helpful!
Thanks
Hi, I am not a good coder on Pytorch, but I can give some rough ideas to fix the error.
First, if I understand your objective correctly, you should extract the pretrained embedding output (not redefine it with FC_Embeddings
like you do). So you should send your input to Bert’s pretrained embedding layer. (send input_ids
to get the embedded output, let named it x
.)
Secondly, only here, that you can use your kwargs['fc_idxs']
to do what you want with x
to get your designed output, let simply named it, y
.
Then, after this point, you can send y
to self.bert
's upper layers (not include embdding layer) but not send kwargs['fc_idxs']
to self.bert
since it doesn’t know this parameter.
NOTE: to send the embedded vector to self.bert
's upper layers, you need to input inputs_embeds
instead of input_ids
.
Please see the manual for reference on inputs_embeds
vs. input_ids
:
And you can see my Tensorflow example doing exactly like this.
Yes, you are correct. I have just one doubt:
How could I find the layers of BERT in a sequential manner so that I can just index it and get all the layers after the embedding layer by doing bert.layers[1:]
.
But, there is no layer
attribute like in your TF code.
Any ideas on this?
Because I could not find it, I tried to change the layer itself instead.
I can get the hidden states of the trained model. But how would I get the parts of the model itself so that I can train/finetune it as per my sub module above?
Thanks
Regarding your doubt, please see my last comment again. You can send inputs_embeds
instead of input_ids
to self.bert
.
This works great. I can just create embeddings my way and call BERT with embeddings instead of ids…superb…Thanks a lot @Jung
I’m having the same problem and would like to know if you have code that managed to change layers? It would help me a lot in my studies.
Can we add positional embeddings to custom embeddings
if we send custom embedding vector then we may loss the positional embeddings, segment embeddings which are input embeddings of bert model. Can we have combination of positional embeddings and custom embeddings together to send to the upper layers of bert
The inputs_embeds parameter is only used to replace the nn.Embedding layer. The position and type embedding are still added to the input embeddings