How to add the uppercase token (and this behaviour) to tokenizers?

I observed that the tokenizers, in majority of cases, first transform the text to lowercase.
Then when creating a tokenizer vocabulary, uppercase and lowercase tokens mean the same thing. I dont want to create another vocabulary entry for the same meaning term.

However, we lost the meaning of the word being in uppercase. For instance, uppercase words appear more frequently in titles, therefore, just by being a uppercase word can give to the model the information that the word can be part of a title. Likewise, uppercase first letter can indicate proper name, country, beggining of sentence, etc. So, uppercasing indeed give valuable information.

So, I think the best solution for this dilema is creating a subword token that indicate that the next word is in uppercase, and other indicating the next word has the first letter in uppercase. And also embed this behavior in the tokenizers functionality.

However, I have no idea how to do this (without creating the tokenizer functions from scratch).
There some easy ways to do so with some of the tokenizer classes?
Specially, I am willing to work with SentencePiece tokenizers (instead of word piece).
The guides I find do not go that deep in the customization of the behavior of the tokenizer.

Thanks in advance

You could use a checkpoint like ‘bert-base-cased’ that maintains case.

from transformers import AutoTokenizer
checkpoint = 'bert-base-cased'

sentence = "Sir Tom Walters"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inp = tokenizer(sentence, return_tensors='pt')
['[CLS]', 'Sir', 'Tom', 'Walters', '[SEP]']