How can I train M2M-100 or NLLB-200 on my parallel bilingual corpus?

affan321 · September 22, 2022, 3:16am

I am very new for NLP - yes it may seems dumb.
But I wanted to use pre-trained model of M2M-100 and train it or bilingual corpus. English Urdu.
Here I am trying to do … it the point is that It takes string as input to get train … … but whole corpus of 2.6 million rows I can’t pass it to the model in a string.
It gets out of index.
It get train on 50 lines max.

Can some one help what I am doing wrong.
Then how to save and load weights ?
How to get BLEU score.
Can any one tell improving in one language will help other languages to ?
Like if we improved it in Urdu… Will persian and arabic get any impact ?
Kindly help me. OUT.

Topic		Replies	Views
M2M model finetuning on multiple language pairs 🤗Transformers	4	1470	August 17, 2022
M2M100 training does not improve model performance 🤗Transformers	0	302	September 29, 2022
M2m-100 finetuning Models	4	3228	November 23, 2022
Is it possible to remove all other language from NLLB200 except English and German? 🤗Transformers	1	719	June 13, 2023
Further pre-train language model in transformers like BERT Models	3	1110	March 27, 2022

How can I train M2M-100 or NLLB-200 on my parallel bilingual corpus?

Related topics