Cross Lingual Transfer Learning ( XNLI )

I was reading this paper on XNLI.

And I wanted to understand what does TRANSLATE-TRAIN and TRANSLATE-TEST entail.

I will write down what I understood.

TRANSLATE-TRAIN: In this, we train N models. N stands for 15 languages. So we train 15 separate models for each language. How do we test this model? Should we run each of these 15 models per language and jot down the average accuracy under each language? For eg: We train 15 language models, then we test each of these 15 models on the English test set and then calculate the average accuracy. Does this sound right?

I have been struggling with this baseline for so long. :frowning_with_open_mouth:

https://arxiv.org/abs/1911.02116