M2M model finetuning on multiple language pairs

Hi all, can anyone please help in suggesting how to finetune m2m100 on more than one pair? I am able to finetune for one lang pair using the below script:

CUDA_VISIBLE_DEVICES=0,1,2,3,6 python -m torch.distributed.run --nproc_per_node=5 run_translation.py --model_name_or_path=m2m100_418M_new_token --do_train --do_eval --source_lang ja --target_lang en --fp16=True --evaluation_strategy epoch --output_dir bigfrall --per_device_train_batch_size=48 --per_device_eval_batch_size=48 --overwrite_output_dir --forced_bos_token “en” --train_file orig_manga/orig/train_exp_frame_50k.json --validation_file orig_manga/orig/valid_exp_frame_50k.json --tokenizer_name tokenizer_new_token --num_train_epochs 50 --save_total_limit=5 --save_strategy=epoch --load_best_model_at_end=True --predict_with_generate

But, now I want to finetune it on ja-en and ja-zh pairs. How to pass these both languages?

2 Likes

I’m also curious about this. @nikhiljais - did you ever work this out?

1 Like

not yet. waiting for some help

1 Like

I think I managed to do this, but my way of doing it is really hacky and fragile so I wouldn’t recommend it. I’ve filed a feature request with the huggingface transformers team to improve this at https://github.com/huggingface/transformers/issues/15500

That feature request has a link to a Colab notebook with the code for how I did it. I believe it is working, but I’m not 100% sure.

1 Like

Hi, @nikhiljais.

I’m interested to know if fine-tuning on one pair affected the quality of other translation directions in your case?

I’m fine-tuning on a different lang pair and that pair works well, but all other directions don’t work at all.