Is it possible to use an external tokenizer like the standard Python tokenizer with a CodeBert model? How?
tokenize — Tokenizer for Python source — Python 3.10.2 documentation
I realize that there is an option for pretokenizers that is able to do something similar, but it still requires me to use a standard tokenizer after that. Is it possible to skip that?