Difference between tokenizer and tokenizerfast

ad26kr · May 18, 2021, 11:44am

Hi,

I have searched for the answer for my question, but still can’t get the clear answer.

Some issues in the github/forum also report that the result of tokenizer and tokenizerfast is a little bit different.

I want to know what is the difference between them (in terms of mechanism)?
If they should output the same result, then why we need both of them?

lewtun · May 18, 2021, 12:53pm

hey @ad26kr can you provide a few links on the reported differences between the two types of tokenizers?

cc @anthony who is the tokenizer expert

ad26kr · May 18, 2021, 1:12pm

@anthony
After careful reading of those posts, I found most of the different results from tokenizers/fast-tokenizers are solved.

However, I am still curious about the reason to separate them, since I found them very different in computation speed. Why not just use the fast one?

lewtun · May 18, 2021, 1:24pm

oh that’s because we do not have rust implementations + python bindings for every type of tokenizer that’s released by the various research groups. by default transformers will look for the fast implementation if it exists, or fall back to the “slow” one when it doesn’t

KhaiKit · December 22, 2023, 3:27pm

Hi @lewtun
I was just wondering if the only difference between fast tokenizers and python tokenizers is really just speed?
Reason being I saw that, for example for NLLB, the python tokenizer is based on SentencePiece while the fast tokenizer is based on BPE. Hence I was wondering if the output of the tokenizers is designed to be the same despite the difference in what it is based on?

Topic		Replies	Views
Difference betweeen DistilBertTokenizerFast and DistilBertTokenizer? 🤗Transformers	2	3214	July 10, 2021
Are the slow and fast tokenizer results the same output for the same input? 🤗Tokenizers	0	564	August 30, 2023
Newbie: Main difference between tokenizers? 🤗Tokenizers	0	836	May 6, 2021
MBart50Tokenizer vs XLMRobertaTokenizer 🤗Tokenizers	0	484	July 19, 2021
How to encode in fast mode using a local tokenizer? Beginners	0	351	May 3, 2023

Difference between tokenizer and tokenizerfast

Related topics