Question for Input of BERT

Hello, if I want to maintain two different dictionaries, one is BERT’s original dictionary and the other is a custom dictionary, and then the input is [CLS] BERT dictionary corpus [SEP] custom dictionary corpus [SEP] , how do I handle the input of the model and what part of the source code do I need to change? Thanks!

What are you trying to accomplish? The dictionary/vocabulary is an input to the tokenizer so you should be able to just switch it (if it conforms to how the tokenizer and and models wants to process it) but I don’t see how you could use two different vocabularies for the same model and get any meaningful results.

My task is the entity disambiguation. Entity embedding and word (mention) embedding is different. [CLS] word (mention) embedding [SEP] entity embedding [SEP]. The output of [CLS] is the score between mention and entity