I want to get embeddings from Facebook/m2m100_418M for a specific language.
Let’s say I have a French sentence and the same one manually translated into English. I get the input embeddings for the established direction French → English. I calculate the cosine similarity of inputs and get a very high score.
What If someone forgot to translate the text? The English text was in French not in English. Would it be possible to get a mapping of English text to French space that would result in a very low similarity score and alert the user?