How can one visualize the Cross-Attention of a VisionEncoderDecoderModel?

Th3Wh1t3Q · March 20, 2023, 10:25am

I think the title says it all.
I’m trying to highlight the attention results between my image and the text generated by the model.

however since I don’t gras fully the concept of attention and the model is complcated by nature I don’t understand what I must take to visualize the cross-attention.

The idea would be to draw a heatmap over the source image to visualize the attention of what the features are and the words of the output sentence that these features are related to.

Thank you for your help !

SD01 · November 6, 2023, 5:57pm

Did you figure out how to do it with a Hugging Face model?

Th3Wh1t3Q · November 7, 2023, 7:21am

I tried to put the attention maps on the image as a mask after rescaling but it wielded no results and I didn’t try any further for now.

Topic		Replies	Views
How to visualize attention of a large encoder-decoder transformer model that isn't a model on hugging face? 🤗Transformers	0	2320	June 28, 2021
VisionEncoderDecoder X-Attn Question 🤗Transformers	4	504	June 20, 2022
Google T5 cross_attentions output Models	0	40	August 29, 2024
Visualizing attention heatmaps of layoutlmv3 🤗Transformers	0	1107	February 25, 2023
Different masks for encoder self and cross attention 🤗Transformers	0	1100	November 8, 2022

How can one visualize the Cross-Attention of a VisionEncoderDecoderModel?

Related topics