I understand that there are a few different tokenizers; e.g., DistilBertTokenizerFast, etc. However, I don’t understand the concept of “Auto” in tokenizer selection. Using IMDB text as an example. What do I get if I use AutoTokenizer to tokenize the text?
The AutoTokenizer
will work on any checkpoint and pick the proper architecture for you (whereas DistilBNertTokenizerFast
will only work for distilbert checkpoints).
Thanks. In this case, are there reasons to use DistilBertTokenizerFast?