Understanding what went wrong in attention

I am working on attention analysis. I want to learn more about where self attention made mistakes while attending to context query. Given two sentences, I am interested in learning more about where self-attention should have paid more attention (and not irrelevant tokens) to provide correct answers. In general, what went wrong in processing a given sample even if fine-tuned transformer is employed.

While there are projects based on visualization like BertViz, ExBERT, I am not sure if it’s straightforward to extract the information I’m looking for.

Do you know of any good projects, or workarounds in Transformers to answer my query ?


Can anyone point me to the method on how to visualize attention in matrix form between query and context sentences ? Is there any other alternative ? Any pointers will be appreciated.

The two you mentioned are the only I know of off the top of my head in terms of visualization tools. What are you trying to do that BertViz and ExBERT don’t provide? (disclaimer: not an expert in this area)

One tricky thing is that the notion of where the model should or should not have paid more attention is not well defined. There’s been debate about whether attention weights can or should be used for interpretation, for example see [1], [2]. Coming up with a convincing argument that a given attention matrix should be one way or the other would probably not be trivial.

Thanks for your helpful reply. I had a look at their abstracts, and do not have a firm opinion on whether attention can fully help us understand what I’m looking for.

Both are good tools for interactive visualization but I want something that provide some quantifiable-ness. For now, I’m using srush way of visualizing attention heatmaps like he did in Annotated Transformer. Since I need to report in the paper, I am looking for static visualizations.

Based on my little interaction with exbert live demo, it can be hard for the reader to distinguish between what both models are looking at (for comparsion purposes).

For my use-case, I want the reader to be able to distinguish what two networks look at and how one is better than other. I hope it makes some sense.

@joeddav Could you please suggest what’s the recommended way to do what Exbert does with our own weights (seeing which token in sentence the model pays attention to) ? HF Exbert works for default pretrained LMs, I want my trained weights to be used for inference task. I’m running experiments on server, building npm and other stuff seems like a lot of work, but I think things may have changed a bit after introduction on HF inference API. I’m using bert-base-uncased (pretty standard), want to load weights from HF model hub instead.

Got it working by using exBERT locally.

1 Like