I have created a custom tokenizer from the tokenizers library, roughly following (The tokenization pipeline — tokenizers documentation). However, these tokenizers do not have utilities like transforming encoded sentences to torch tensors and so on. For this, I’d want to use the PreTrainedTokenizerFast class. It exposes an interface for getting the tokenizers of various pretrained models from google/facebook/etc, but I want to use my own tokenizer.
How do I create a PreTrainedTokenizerFast from my own tokenizer?