I trained a m2m100 model against technical / mechanical corpus in 6 languages from existing translations texts .
The training data will permit the model to understand the new words of this technical world.
But unfortunately at this time I have no evaluation data.
I will have evaluations data, only when using the trained model and translating new texts. A human will read these auto translated texts and will indicate if translation is right or not. (and so this may be my validation data)
So, I have some questions :
1/ how to evaluate blue score in this context ?
2/ It is a bit complicated to knwo if I enough trained my models (how to adjust n_epochs ?)
3/ What do you advice in my usecase ?