oh that’s because we do not have rust implementations + python bindings for every type of tokenizer that’s released by the various research groups. by default transformers will look for the fast implementation if it exists, or fall back to the “slow” one when it doesn’t
Hi @lewtun
I was just wondering if the only difference between fast tokenizers and python tokenizers is really just speed?
Reason being I saw that, for example for NLLB, the python tokenizer is based on SentencePiece while the fast tokenizer is based on BPE. Hence I was wondering if the output of the tokenizers is designed to be the same despite the difference in what it is based on?