How to concatenate the word embedding for special tokens and words

I tried to add an extra dimension to the Huggingface pre-trained BERT tokenizer. The extra column represents the extra label. For example, if the original embedding of the word “dog” was [1,1,1,1,1,1,1], then I might add a special column with index 2 to represent ‘noun’. Thus, the new embedding becomes [1,1,1,1,1,1,1,2]. Then, I will feed the new input [1,1,1,1,1,1,1,2] into the Bert model. How can I do this in Huggingface?

There is something called tokenizer.add_special_tokens which extends the original vocabulary with new tokens. However, I want to concatenate the embedding of the original vocabulary with the embedding of the tokenizer. For example, I want the Bert model to understand that Dog is a noun by connecting the embedding of dog to the embedding of noun. Should I even change the input word embedding of a pre-trained model? Or should I somehow enhance the attention on “dog” and “noun” in the middle layer?

Here is the example of using tokenizer.add_special_tokens

tokenizer = GPT2Tokenizer.from_pretrained(‘gpt2’)
model = GPT2Model.from_pretrained(‘gpt2’)

special_tokens_dict = {‘cls_token’: ‘’}

num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)
print(‘We have added’, num_added_toks, ‘tokens’)

assert tokenizer.cls_token == ‘’

I found solution here : How to use additional input features for NER?