Code example of getting cross attention from T5?

Hi fellow hackers – could you point me to a short code snippet showing how to obtain cross attention weights from a T5 model (T5ForConditionalGeneration)? I’d like to get the attention distribution for every decoded token on all input tokens.

Thanks a lot!