I am very new for NLP - yes it may seems dumb.
But I wanted to use pre-trained model of M2M-100 and train it or bilingual corpus. English Urdu.
Here I am trying to do … it the point is that It takes string as input to get train … … but whole corpus of 2.6 million rows I can’t pass it to the model in a string.
It gets out of index.
It get train on 50 lines max.
Can some one help what I am doing wrong.
Then how to save and load weights ?
How to get BLEU score.
Can any one tell improving in one language will help other languages to ?
Like if we improved it in Urdu… Will persian and arabic get any impact ?
Kindly help me. OUT.