Fine-tuning an NLLB model for a new language

CmdCody · December 31, 2024, 12:50pm

Unfortunately no. I tried to delete and recreate the token IDs as suggested, which initially led to a different ordering of the language token IDs, but even after fixing it I basically ended up with the same situation as before. I also dug around in the tokenizer a bit, but I don’t really understand the root cause of the issue.

Topic		Replies	Views
How to add new language to NLLB tokenizer in Huggingface? 🤗Transformers	2	1929	September 30, 2023
Finetune a pretrained huggingface translation model on a new language pair Models	1	1037	January 12, 2024
Select Source and Target Langauge in multi-language translation models 🤗Transformers	1	375	August 14, 2024
LM finetuning on domain specific unlabelled data Beginners	6	4675	April 21, 2021
How to "further pretrain" a tokenizer (do I need to do so?) 🤗Tokenizers	5	4398	February 20, 2022

Fine-tuning an NLLB model for a new language

Related topics