How to instill auxiliary information coupled with words into transformers?

Assume that auxiliary information is attached to some words. Our goal is to use them at finetuning for some tasks.

Specifically, we want to finetune BERT or GPT-2 on texts with named entities. For instance, we want to feed “Jim (Person) bought 300 shares of Acme Corp. (Organization) in 2006 (Time)”, i.e., a text with named entities, to transformers instead of “Jim bought 300 shares of Acme Corp. in 2006”

Note that such auxiliary information, e.g., named entities, is coupled with specific words in most cases.

If we feed the above “annotated” sentence, a pretrained tokenizer breaks the words into pieces. Hence, the model would not notice the annotation, e.g., Organization, directs its corresponding word, e.g., Acme Corp.

What would be the standard practice to instill auxiliary information coupled with words in a sentence into transformers?