Finetune a pretrained huggingface translation model on a new language pair

Is it possible to fine-tune any pretrained huggingface BERT-based multilingual translation model (e.g., NLLB) on a new language pair, with one language already seen (let it be English) and the other not seen in a pretrained model?
If yes, is all the procedure the same, i.e., create a dataset and implement a training/fine-tuning script?

1 Like

Here’s one possibly helpful resource: How to fine-tune a NLLB-200 model for translating a new language | by David Dale | Medium

There are some additional steps compared to the usual fine-tuning process, e.g. adding tokens from the new language to the tokenizer.

The article linked above uses a good amount of custom code though, I’m still looking around for something more Huggingface-centric.