Select Source and Target Langauge in multi-language translation models

mohdyaser · April 4, 2023, 2:20pm

I want to fine tune a Facebook NLLB model for translation, my question is, how can I specify the source and target language for the tokenizer and the trainer?
Is there any other information I need to know too? I’m a beginner.

sanjeev-bhandari01 · August 14, 2024, 11:35am

You just need to load the right tokenizer when you first load the tokenizer. Rest is similar to finetuning of translation model.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_REPO = "facebook/nllb-200-1.3B"
tokenizer = AutoTokenizer.from_pretrained(MODEL_REPO, src_lang="jpn_Jpan", tgt_lang="eng_Latn")
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_REPO)

See more detail on using NLLBtokenizer: NLLB200 - Hugging face docs

Topic		Replies	Views
NLLB tokenizer multiple target/source languages within a training batch 🤗Tokenizers	5	1469	January 10, 2025
Fine-tuning an NLLB model for a new language 🤗Transformers	7	2677	January 12, 2025
Fine-tuning NLLB model Models	1	2677	July 20, 2023
Too strange translation result in NLLB-200-3.3B Models	0	445	September 13, 2023
Fine tuning nllb model Beginners	0	772	February 1, 2023

Select Source and Target Langauge in multi-language translation models

Related topics