I’ve trained Vit+GPT2 image caption generation model. Now I want to plot the attention maps to see what the model is looking at based upon attention. But Seems like there is no way with seq2seq trainer API to deal with that or is there any?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Transformers Attention Viz - Visualizing Cross-Modal Attention in CLIP/BLIP Models | 0 | 14 | July 21, 2025 | |
How can i implement custom model to use Seq2SeqTrainer class | 0 | 440 | November 8, 2023 | |
Vision Transformer reconstruct image | 2 | 1109 | July 21, 2022 | |
How to plot an attention map for Vision Transformer model | 0 | 2115 | April 12, 2024 | |
Key-value pair from attention layer of GPT2 | 0 | 327 | June 28, 2023 |