Matching original and translated words with MarianMT

Hello, hopefully I’m at the right sub.

After translating a sentence with MarianMT I’m trying to match the original words with the translations that they generated. Or at least come up with a probability.

For example I want to translate from English to German and I have the following sentences.

En: I will buy a washing machine tomorrow.
De: Ich werde morgen eine Waschmaschine kaufen

I want to be able to say that model took “washing” and “machine” from the original English sentence and matched it with the “Waschmaschine” in the translated text.

Do you have any tips on how to achieve it? Original MarianMT mentions that scoring algorithm can be used to align two sentences. It seems similar but not the exact match it seems. Any ideas?

1 Like

I’m not exactly sure how to do this but I think what you are basically trying to do is to find how strongly the decoder’s attention mechanism “attends” to each token that comes in from the encoder. An example is presented here in this blog post;

Based on this idea, you could try taking the final transformer stack in the decoder and visualize the attention weights where it attends to the encoder inputs.

I just googled and found this library that does this for you - link.