Fine-tuning of multilingual (translation) models

Hi guys,

I want to fine-tune pre-trained multilingual Models (MarianMT in this case) for domain-specific translation. I want the models to be able to translate between 5 different languages. I have domain-specific datasets for every sentence pair (e.g. de-en, en-de, de-es, es-de and so on). In the tutorials for fine-tuning I only could find fine-tuning for single language pairs (e.g. only the pretrained “Helsinki-NLP/opus-mt-en-roa” model is downloaded ) which then needs to be fine-tuned on en-roa datasets. What I want to do is to train the whole multilingual model (not just en-roa). I want to mix the sentences of all sentence pairs of my datasets into one dataset and fine-tune the whole multilingual model on this huge dataset. How can I achieve this task? Is it possible to download the “whole” model and not just the language pair models like en-roa? I hope someone can help me :slight_smile:

Best regards,

Simon

Try this GitHub - masakhane-io/lafand-mt: MAFAND-MT

1 Like