Custom huggingface Tokenizer with custom model for BERT

I am working on molecule data with representation called SMILES. an example molecule string looks like Cc1ccccc1N1C(=O)NC(=O)C(=Cc2cc(Br)c(N3CCOCC3)o2)C1=O.

Now, I want a custom Tokenizer which can be used with Huggingface transformer APIs. I also donot want to use the existing tokenizer models like BPE etc. I want the SMILES string parsed through regex to give individual characters as tokens as follows:

import re

SMI_REGEX_PATTERN = r"""(\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\(|\)|\.|=|

regex = re.compile(SMI_REGEX_PATTERN)

molecule = 'Cc1ccccc1N1C(=O)NC(=O)C(=Cc2cc(Br)c(N3CCOCC3)o2)C1=O'
tokens = regex.findall(molecule)

It is fairly simple to do the above, but I need a tokenizer which works with, let’s say BERT API of Huggingface. Also, I donot want to use lowercase conversion, but still use BERT.

The documentation here in quicktour doesn’t talk about creating custom model as far as I can see.

Any help is highly appreciated.