Tokenizer effect on the fine-tuning

Ali-C137 · October 6, 2023, 5:19pm

Hi everyone, I’m on a project to fine-tune multiple 7B and less text2text generation models on arabic and i was wondering about the effect of the original tokenizer on the fine tuning process or what if i use a tokenizer different from the model’s original one! Let’s say BLOOM tokenizer, Will that hurts the model’s performance? So if anyone have seen a paper to discuss this or something similar please drop it here it will be really beneficial or simply comment your thoughts

Topic		Replies	Views
Customized tokenizers Beginners	0	250	August 18, 2022
Customizing T5 tokenizer for finetuning 🤗Transformers	1	618	May 2, 2024
Fine tuning Bloom for Q&A Beginners	0	452	September 6, 2022
Fine tune the text generation with gpt2 Beginners	2	441	February 22, 2023
Fine-Tuning a Text2Text Model using different tokenizer 🤗Transformers	5	73	January 20, 2025

Tokenizer effect on the fine-tuning

Related topics