Is there any difference in the tokenized output if I load the tokenizer from a different pretrained model

itsmejim · August 26, 2020, 9:00am

So let’s say if I do
GPT2TokenizerFast.from_pretrained('gpt2-medium') vs GPT2TokenizerFast.from_pretrained('distilgpt2')

Is there actually any differences in their tokenized output?

joeddav · August 26, 2020, 4:30pm

In that particular case, I don’t think so, but there are definitely cases where tokenizers from the same model type but different pretrained configurations are different. bert-base-uncased vs bert-base-cased would be one clear example.

itsmejim · September 3, 2020, 7:11am

Thank you for your clarification

Topic		Replies	Views
Should I use BertConfig? Why these output are different? Beginners	1	520	February 11, 2022
Load pretrained model's tokenizer with or without vocabulary? Beginners	2	144	August 30, 2024
Different tokenization for the same word fed alone vs in a sentence Beginners	0	279	July 6, 2021
Does tokenizer changed during model training Beginners	2	1126	August 11, 2022
Questions about the connection between tokenizer and the model Beginners	0	308	September 19, 2023

Is there any difference in the tokenized output if I load the tokenizer from a different pretrained model

Related topics