What exacly is changed in the tokenizer after training it?

mdanieli · July 28, 2022, 5:26pm

Only the new words it learns are added? Or anything else? (other words removed, or more things?)
I ask since after I trained the tokenizer, new words of the dataset indeed were added, but the model (which I used with this pre-trained tokenizer) performance was reduced. How can I debug what made the performance to reduce (no change in the model bedore and after the tokenizer training)?

Thanks,
Michal

Topic		Replies	Views
Does tokenizer changed during model training Beginners	2	1130	August 11, 2022
Does training tokenizer and adding new token to model when training BART on custom dataset improve performance? Beginners	3	887	May 1, 2023
Does the tokenization in BERT change after fine-tuning? Models	0	592	January 27, 2023
Questions about the connection between tokenizer and the model Beginners	0	308	September 19, 2023
Customizing T5 tokenizer for finetuning 🤗Transformers	1	618	May 2, 2024

What exacly is changed in the tokenizer after training it?

Related topics