I am trying to finetune the T5 model and need to get cross attention scores as well as self-attention scores. However when I set output_attentions=True, the model only returns self-attention values.
Any idea how to get cross-attention values such as 6 elements with B,8,Tx,Ty ? (num_heads=8, num_layers=6)
I am doing forward call on t5 model
t5 = T5ForConditionalGeneration.from_pretrained(“t5-small”)
outputs = t5(input_ids=input_ids, labels=output_ids, use_cache=False, output_attentions=True, output_hidden_states=True)
output returns 7 elements
I have been trying to get the cross-attention weights as well for the MarianMT model.
I found out inside the source code that attention weights from the encoder-decoder layer wasn’t being included in the attentions being returned.
I’m basing my assumptions from looking at the DecoderLayer class here
If you do find a workaround , please share it on this thread !