M2M100 12B performs worse that 1.2B

evroschris98 · March 26, 2022, 9:34pm

Hi!

I evaluated the out-of-the-box performance of different M2M100 versions on some custom datasets. I observed that facebook/m2m100-12B-last-ckpt and facebook/m2m100-12B-avg-5-ckpt perform much worse than facebook/m2m100_1.2B.

Do you know why this happens? Are the weights of the m2m100 12B model not yet finalized?

Thank you!

kinetical · March 27, 2022, 3:58am

Hi, I have the same experience with M2M 12B, 1.2B and the 400M versions. In my opinion I think 12B truly outperforms in rich-resourced language pairs such as DE-EN and FR-EN. However in other lower-resourced languages, 12B’s performance is not significantly different from 1.2B. From my own experience, I think 1.2B actually translates the best from and to Malay.

evroschris98 · March 27, 2022, 12:53pm

Thank you for your answer @kinetical ,

I evaluated the models on the English to German FLORES dataset.(GitHub - facebookresearch/flores: Facebook Low Resource (FLoRes) MT Benchmark).

This is how the models perform:

facebook/m2m100_1.2B: 35.39 BLEU
facebook/m2m100-12B-avg-5-ckpt: 12.44 BLEU

So, the problem exists for rich-resourced language pairs as well. M2M100 12B performance is much lower than 1.2B.

Thank you!

kinetical · May 12, 2022, 6:06pm

Hi @evroschris98 Thank you for your shared result.

Recently I have another opportunity to compare 12B and 1.2B, and I found out that the model capacity is the key difference between the two. I realized that for single pair of languages, 1.2B almost outperformed in every comparison against 12B. However when I modified the code to finetune for several pairs and directions (a group of geographically neighboring languages ), oh my god 12B shines. There were just more than enough capacity to actually “memorize all these languages”.

anzorq · August 17, 2022, 10:00am

Hi, @kinetical.

I’m interested to know if fine-tuning m2m affected the quality of other translation directions in your case?

I’m fine-tuning on one lang pair and that pair works well, but it breaks all the other directions.

And could you share your fine-tuning script, maybe I’m doing something wrong?

Topic		Replies	Views
Fine-tuning M2M100 & Mbartcc25 for Machine Translation OnetoMany Models	2	980	November 23, 2022
M2m-100 finetuning Models	4	3214	November 23, 2022
M2M100 training does not improve model performance 🤗Transformers	0	302	September 29, 2022
How can I train M2M-100 or NLLB-200 on my parallel bilingual corpus? 🤗Transformers	0	782	September 22, 2022
[new model] FSMT has been released + 9 models ported 🤗Transformers	3	1146	September 25, 2020

M2M100 12B performs worse that 1.2B

Related topics