How to get cross-attention values of T5?

I am trying to finetune the T5 model and need to get cross attention scores as well as self-attention scores. However when I set output_attentions=True, the model only returns self-attention values.
Any idea how to get cross-attention values such as 6 elements with B,8,Tx,Ty ? (num_heads=8, num_layers=6)

I am doing forward call on t5 model
t5 = T5ForConditionalGeneration.from_pretrained(“t5-small”)
outputs = t5(input_ids=input_ids, labels=output_ids, use_cache=False, output_attentions=True, output_hidden_states=True)
output returns 7 elements

outputs[0]: loss
outputs[1]: input_embeddings
outputs[2]: decoder_hiddens : B,Ty,H
outputs[3]: decoder_self_attentions # B,8,Ty,Ty
outputs[4]: encoder_last_hidden states ?
outputs[5]: encoder_hiddens?
outputs[6]: encoder_self_attentions # B,Tx, Tx

Thanks a lot!


I have been trying to get the cross-attention weights as well for the MarianMT model.
I found out inside the source code that attention weights from the encoder-decoder layer wasn’t being included in the attentions being returned.
I’m basing my assumptions from looking at the DecoderLayer class here

If you do find a workaround , please share it on this thread !



In this pull request you can see the work-around for t5.

So in your case, I guess if you make the following changes:

  • add the _ variable (which holds cross-attentions) to the returned attention value, you can get layer-wise cross attentions scores when output_attentions=True.

Hope it helps,