AutoTokenizer vs. regular Tokenizer

vett93 · May 31, 2021, 11:46pm

I understand that there are a few different tokenizers; e.g., DistilBertTokenizerFast, etc. However, I don’t understand the concept of “Auto” in tokenizer selection. Using IMDB text as an example. What do I get if I use AutoTokenizer to tokenize the text?

sgugger · June 1, 2021, 1:02pm

The AutoTokenizer will work on any checkpoint and pick the proper architecture for you (whereas DistilBNertTokenizerFast will only work for distilbert checkpoints).

vett93 · June 2, 2021, 5:30pm

Thanks. In this case, are there reasons to use DistilBertTokenizerFast?

Topic		Replies	Views
Importing a DistilBertTokenizer does not work using AutoTokenizer Beginners	0	653	November 8, 2023
Difference betweeen DistilBertTokenizerFast and DistilBertTokenizer? 🤗Transformers	2	3230	July 10, 2021
Tokenizer vs Model 🤗Tokenizers	0	256	June 24, 2024
Custom DistilBertTokenizer training 🤗Transformers	3	658	November 13, 2020
How to convert Tokenizer to TokenizerFast? Beginners	1	546	September 30, 2020

AutoTokenizer vs. regular Tokenizer

Related topics