Hello,
The title is self-explanatory. I have access to a published Bert model with its custom tokenizer. I have the vocab file, and python functions that receive a text, tokenize it according to the vocab, and do some post-processing and convert them to IDs acceptable by Bert model.
I would like to transform the custom tokenizer, which is a hassle to work with, to a tokenizer so I can use all of amazing functionalities that and other -based libraries provide. I already managed to transform the Bert model to a BertModel, but tokenzier seems to be trickier.
Is there a way I can somehow transform this non- tokenizer to a tokenizer? Is PreTrainedTokenizer what I need to use?