Can we add extra word embedding to the BERT?

Hi Everyone,

I am trying to add a tfidf-weighted word2vec embedding to the BERT input as an experiment. To generate text using bert_to_bert or BERT encoder with other transformer decoder.

Thanks,
Rohan

Hello! :grin:

I’m not sure I 100% understand what you’re trying to do.

If what you want is to add another embedding matrix to the existing word embedding matrix, you can do this:

import torch

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-uncased")
with torch.no_grad():
    model.embeddings.word_embeddings.add_(word2vec_matrix)

If you want another embedding layer, then it’s a bit more complicated. You will need to copy the BertEmbeddings layer code from here and add the new layer there. Once you’ve done that, you can just switch the whole module and re-set the pre-trained weights:

with torch.no_grad():
    word_embeddings = torch.clone(model.embeddings.word_embeddings.weight)
    pos_embeddings = torch.clone(model.embeddings.position_embeddings.weight)
    token_type_embeddings = torch.clone(model.embeddings.token_type_embeddings.weight)

config = AutoConfig.from_pretrained("bert-base-uncased")
model.embeddings = AugmentedBertEmbeddings(config)

with torch.no_grad():
    model.embeddings.word_embeddings.weight.set_(word_embeddings)
    model.embeddings.position_embeddings.weight.set_(pos_embeddings)
    model.embeddings.token_type_embeddings.weight.set_(token_type_embeddings)

I hope I haven’t made any mistakes, and that I managed to help :slight_smile:

2 Likes

Thanks @beneyal. This seems complicated as I may need to re-train everything and it is complex. Can we combine BERT embedding and my own embedding before passing it to the decoder?

BERT doesn’t have a decoder, so I’m not sure what you’re referring to.

Hi @beneyal, I am planning to pre-train BERT with an extra input embedding( For e.g. one input for Wordpiece tokenizer, one for BPE ) Can I make it with the same method you explained? Actually, I am planning to add some syntactic information in the second embedding. Does it make sense?

hello,
i think my question here How to concat laserembeddings with huggingface funnel transformers simple CLS output for fine tuning on downstream NLP sequence classification data problem? is similar?
can you please help? thanks in advance.

your code doesn’t work.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_17/1823014925.py in <module>
      8 model = BertModel.from_pretrained("bert-base-uncased")
      9 with torch.no_grad():
---> 10     model.embeddings.word_embeddings.add_(fold0_laser)
     11 # model.deberta.embeddings.word_embeddings

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
   1184                 return modules[name]
   1185         raise AttributeError("'{}' object has no attribute '{}'".format(
-> 1186             type(self).__name__, name))
   1187 
   1188     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

AttributeError: 'Embedding' object has no attribute 'add_'