Can we add extra word embedding to the BERT?

Hi Everyone,

I am trying to add a tfidf-weighted word2vec embedding to the BERT input as an experiment. To generate text using bert_to_bert or BERT encoder with other transformer decoder.


Hello! :grin:

I’m not sure I 100% understand what you’re trying to do.

If what you want is to add another embedding matrix to the existing word embedding matrix, you can do this:

import torch

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-uncased")
with torch.no_grad():

If you want another embedding layer, then it’s a bit more complicated. You will need to copy the BertEmbeddings layer code from here and add the new layer there. Once you’ve done that, you can just switch the whole module and re-set the pre-trained weights:

with torch.no_grad():
    word_embeddings = torch.clone(model.embeddings.word_embeddings.weight)
    pos_embeddings = torch.clone(model.embeddings.position_embeddings.weight)
    token_type_embeddings = torch.clone(model.embeddings.token_type_embeddings.weight)

config = AutoConfig.from_pretrained("bert-base-uncased")
model.embeddings = AugmentedBertEmbeddings(config)

with torch.no_grad():

I hope I haven’t made any mistakes, and that I managed to help :slight_smile:

Thanks @beneyal. This seems complicated as I may need to re-train everything and it is complex. Can we combine BERT embedding and my own embedding before passing it to the decoder?

BERT doesn’t have a decoder, so I’m not sure what you’re referring to.