How to get translation with attention using MarianMT

kouhua · November 30, 2020, 9:33am

Hi, I am trying to achieve this translation attention with MarianMT model as tf tutorial

Basically is to tell which word corresponding to the generated word.
I am not sure if I have use the correct field in transformers output.
Here is the core code

from transformers import MarianMTModel, MarianTokenizer
import numpy as np

class MarianZH():
    def __init__(self):
        model_name = 'Helsinki-NLP/opus-mt-en-zh'
        self.tokenizer = MarianTokenizer.from_pretrained(model_name)
        print(self.tokenizer.supported_language_codes)
        self.model = MarianMTModel.from_pretrained(model_name)

    def input_format(self,en_text):
        if type(en_text)==list:
            # use batch
            src_text=[]
            for i in en_text:
                src_text.append(">>cmn_Hans<< "+i)
        elif type(en_text)==str:
            src_text=[
                '>>cmn_Hans<< '+en_text,
            ]
        else:
            raise TypeError("Unsupported type of {}".format(en_text))
        return src_text


    def get_attention_weight(self,en_text):
        src_text=self.input_format(en_text)
        batch = self.tokenizer.prepare_seq2seq_batch(src_text)
        tensor_output=self.model(batch['input_ids'],return_dict=True,output_attentions=True)
        attention_weights=tensor_output.cross_attentions[-1].detach()
        batch_size, attention_heads,input_seq_length,output_seq_length=attention_weights.shape
        translated = self.model.generate(**batch)
        for i in range(batch_size):
            attention_weight_i=attention_weights[i,:,:,:].reshape(attention_heads,input_seq_length,output_seq_length)
            cross_weight=np.sum(attention_weight_i.numpy(),axis=0) # cross weight
            yield cross_weight

if __name__ == '__main__':
    src_text = [
        '>>cmn_Hans<< Thai food is delicious.',
        ]
    mdl=MarianZH()
    attention_weight=mdl.get_attention_weight(src_text)

btw. I am using transformers==3.5.1

Is this cross_weight the attention matrix corresponding to translation attention? But the output seems to be always focus on first columns or last columns.

kouhua · December 1, 2020, 7:21am

I’m not sure about if this attention weight matrix is accessible in MarianMT model or not. As the structure of the MarianMT is different in contrast to the one in TF tutorial. If anyone could tell if this task is possible please?

BramVanroy · December 1, 2020, 9:45am

AFAIK Marian is implemented as a subclass of BART for generation, which outputs cross attention weights:

github.com

huggingface/transformers/blob/4a9e502a3602d21a6005259fb57b8e1c78101410/src/transformers/models/bart/modeling_bart.py#L1059-L1069


return Seq2SeqLMOutput(
    loss=masked_lm_loss,
    logits=lm_logits,
    past_key_values=outputs.past_key_values,
    decoder_hidden_states=outputs.decoder_hidden_states,
    decoder_attentions=outputs.decoder_attentions,
    cross_attentions=outputs.cross_attentions,
    encoder_last_hidden_state=outputs.encoder_last_hidden_state,
    encoder_hidden_states=outputs.encoder_hidden_states,
    encoder_attentions=outputs.encoder_attentions,
)

cc @patrickvonplaten

Annastya · March 30, 2023, 10:26pm

Hi, I’m also analyzing the cross_attention of the transformer marianmt model. I observed however the alignment between the input and output of layer 1 and layer 0 from the output.cross_attention is better that the cross_attention weights from layers 5 or 6. I even visualize the alignment using bertviz jessevig/bertviz · Discussions · GitHub, I’m very confused why the bottom layers are better than the top layers. Do you have the same results? Do you have any ideas on how to explain it? I’m now thinking if the 0 layer is the output layer.

Topic		Replies	Views
Matching original and translated words with MarianMT Models	1	1066	May 21, 2021
MarianMT model cross attention layers alignment problem! Models	0	334	April 3, 2023
Issue with using a save_pretrained model (MarianMT) 🤗Transformers	1	447	April 5, 2023
Getting output attentions for encoder_attention decoder layers 🤗Transformers	0	352	October 24, 2020
Issues with save_pretrained (MarianMT) Beginners	1	656	April 11, 2023

How to get translation with attention using MarianMT

Related topics