Does tokenizer changed during model training

Talha · June 15, 2022, 7:01pm

I want to know during model training, does tokenizer alter it self, (i mean does it train) or it remain the same as original one.
For example i used the
AutoTokenizer.from_pretrained(model_name,local_files_only=False)
after model training, is there any change to tokenizer?

Slinae · June 16, 2022, 1:44am

Hi, Talha,

Tokenizer id does not change. For example, the token_id of [['sep']] is 102. It will be 102 through the training. But the embedding hidden state will be changed.

ShieldHero · August 11, 2022, 1:47pm

One doubt. So lets say I have trained two models of “bert-base-uncased” to do different classification tasks(same number of classes).
So, now, can I use the tokenizer of one model1 to tokenize the other one? Will the result be different compared to using two different tokenizers?

Topic		Replies	Views
What exacly is changed in the tokenizer after training it? Beginners	0	467	July 28, 2022
Should we save the tokenizer state over domain adaptation? Beginners	1	398	January 10, 2022
Does the tokenization in BERT change after fine-tuning? Models	0	596	January 27, 2023
How can I change the token id of a special token? 🤗Tokenizers	0	890	January 6, 2022
Do you need to use the associated tokenizer Beginners	2	579	June 6, 2022

Does tokenizer changed during model training

Related topics