Hello,
I’m trying to fine-tune M2M100 using the script run_translation.py and it seems that the model is not improving.
I am using the following command:
deepspeed examples/pytorch/translation/run_translation.py \
--deepspeed tests/deepspeed/ds_config_zero3.json \
--model_name_or_path facebook/m2m100_418M \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--output_dir output_dir --overwrite_output_dir \
--fp16 \
--do_train --do_eval --do_predict \
--max_train_samples 500 --max_eval_samples 50 --max_predict_samples 50 \
--num_train_epochs 0.001 \
--dataset_name wmt16 --dataset_config "ro-en" \
--source_lang en --target_lang ro \
--predict_with_generate --forced_bos_token ro
Just to give you an example, if I train for 1 epoch I can get 20 BLEU points in the test set, but if I train for 3 epochs I get around 10 BLEU points.
Am I doing anything wrong? Does M2M100 requires any specific hyperparameter/hyperparameter configuration?
Thanks