Difference betweeen DistilBertTokenizerFast and DistilBertTokenizer?

ayalaall · May 4, 2021, 5:59am

Hi everyone,

I’m trying to understand what is the difference between the DistilBertTokenizerFast and the DistilBertTokenizer.

From the documentation it looks like that the DistilBertTokenizerFast “Construct a “fast” DistilBERT tokenizer (backed by HuggingFace’s tokenizers library).” whereas the DistilBertTokenizer “Construct a DistilBERT tokenizer.”, but I don’t understand what that means.

This is what I found from the documentation: Tokenizer — transformers 4.5.0.dev0 documentation
To the best of my understanding the difference between a “fast” and a “non-fast” tokenizer is computation speed but there is no functional difference between them. Please correct me if I’m not the right direction.

Any help will be greatly appreciated.

Thank you,
Ayala

juroylim · July 10, 2021, 9:51am

@ayalaall I also have the same thought as you.

I also want to know about this. Are they the same in terms of function?

BramVanroy · July 10, 2021, 10:27am

FastTokenizers are implemented in Rust and are factors faster than the Python based tokenizers. Apart from that their encoding methods should behave the same. However, they are not functionally identical. From the docs:

When the tokenizer is a “Fast” tokenizer (i.e., backed by HuggingFace tokenizers library), this class provides in addition several advanced alignment methods which can be used to map between the original string (character and words) and the token space (e.g., getting the index of the token comprising a given character or the span of characters corresponding to a given token).

Typically you want to use the fast tokenizer if it is available.

Topic		Replies	Views
Difference between tokenizer and tokenizerfast Beginners	4	4233	December 22, 2023
How to convert Tokenizer to TokenizerFast? Beginners	1	546	September 30, 2020
Custom DistilBertTokenizer training 🤗Transformers	3	658	November 13, 2020
Newbie: Main difference between tokenizers? 🤗Tokenizers	0	836	May 6, 2021
AutoTokenizer vs. regular Tokenizer Beginners	2	1621	June 2, 2021

Difference betweeen DistilBertTokenizerFast and DistilBertTokenizer?

Related topics