How to get cross-attention values of T5?

mugekural · September 2, 2020, 7:01pm

Hello,
I am trying to finetune the T5 model and need to get cross attention scores as well as self-attention scores. However when I set output_attentions=True, the model only returns self-attention values.
Any idea how to get cross-attention values such as 6 elements with B,8,Tx,Ty ? (num_heads=8, num_layers=6)

I am doing forward call on t5 model
t5 = T5ForConditionalGeneration.from_pretrained(“t5-small”)
outputs = t5(input_ids=input_ids, labels=output_ids, use_cache=False, output_attentions=True, output_hidden_states=True)
output returns 7 elements

outputs[0]: loss
outputs[1]: input_embeddings
outputs[2]: decoder_hiddens : B,Ty,H
outputs[3]: decoder_self_attentions # B,8,Ty,Ty
outputs[4]: encoder_last_hidden states ?
outputs[5]: encoder_hiddens?
outputs[6]: encoder_self_attentions # B,Tx, Tx

Thanks a lot!

rajat08 · October 9, 2020, 3:34am

Hello,

I have been trying to get the cross-attention weights as well for the MarianMT model.
I found out inside the source code that attention weights from the encoder-decoder layer wasn’t being included in the attentions being returned.
I’m basing my assumptions from looking at the DecoderLayer class here

If you do find a workaround , please share it on this thread !

Thanks!

mugekural · October 9, 2020, 5:56pm

Hello,

In this pull request you can see the work-around for t5.
https://github.com/huggingface/transformers/pull/7213#issuecomment-696340597

So in your case, I guess if you make the following changes:
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bart.py#L460

add the _ variable (which holds cross-attentions) to the returned attention value, you can get layer-wise cross attentions scores when output_attentions=True.

https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bart.py#L484

Hope it helps,

Best,

Topic		Replies	Views
Code example of getting cross attention from T5? Intermediate	0	366	February 15, 2023
Self-attention extraction from Long T5 🤗Transformers	0	246	March 5, 2024
T5 cross-attention - inconsistent results Intermediate	1	1382	May 10, 2021
Google T5 cross_attentions output Models	0	40	August 29, 2024
T5: why do we have more tokens expressed via cross attentions than the decoded sequence? Intermediate	1	386	February 21, 2023

How to get cross-attention values of T5?

Related topics