M2M100 training does not improve model performance

joaomiguel26 · September 29, 2022, 2:38pm

Hello,
I’m trying to fine-tune M2M100 using the script run_translation.py and it seems that the model is not improving.
I am using the following command:

deepspeed examples/pytorch/translation/run_translation.py \
        --deepspeed tests/deepspeed/ds_config_zero3.json \
        --model_name_or_path facebook/m2m100_418M \
        --per_device_train_batch_size 8 \
        --per_device_eval_batch_size 8 \
        --output_dir output_dir --overwrite_output_dir \
        --fp16 \
        --do_train --do_eval --do_predict \
        --max_train_samples 500 --max_eval_samples 50 --max_predict_samples 50 \
        --num_train_epochs 0.001 \
        --dataset_name wmt16 --dataset_config "ro-en" \
        --source_lang en --target_lang ro \
        --predict_with_generate --forced_bos_token ro

Just to give you an example, if I train for 1 epoch I can get 20 BLEU points in the test set, but if I train for 3 epochs I get around 10 BLEU points.
Am I doing anything wrong? Does M2M100 requires any specific hyperparameter/hyperparameter configuration?
Thanks

Topic		Replies	Views
M2M model finetuning on multiple language pairs 🤗Transformers	4	1470	August 17, 2022
M2m-100 finetuning Models	4	3228	November 23, 2022
M2M100 12B performs worse that 1.2B 🤗Transformers	4	1290	August 17, 2022
Fine-tuning M2M100 & Mbartcc25 for Machine Translation OnetoMany Models	2	983	November 23, 2022
Conversion from finetune m2m_100 model to huggingface format 🤗Transformers	0	111	April 22, 2024

M2M100 training does not improve model performance

Related topics