Optimal methods to monitor attention matrices when doing training/inference using BERT-type models

vgoklani · September 8, 2021, 2:31pm

Our team is using BERT/Roberta from the huggingface transformers library for sequence-classification (amongst other tasks). We are looking for an efficient way to monitor the attention matrices so as to understand what the model is doing during inference (i.e. the model made this prediction because it is focusing on these words, etc). Are there any useful code snippets used for analysis.

Often the models make funny predictions, and it’s hard to understand why… How are other teams managing this process? We want to avoid large bloated (graphical) tools, and would prefer simplicity.

thanks!

TheLongSentance · September 11, 2021, 1:36pm

Just checking you have seen BertViz as a source of ideas if nothing else.

vgoklani · September 11, 2021, 2:25pm

yes, i am looking for something more direct without all the overhead.

Topic		Replies	Views
Bertology-like Analysis for BART, T5? Research	0	669	August 31, 2020
Visualize matrix inference for Roberta (Transformer) Beginners	1	615	August 31, 2022
Using Attention matrix to explain a classification problem? Models	0	641	March 25, 2022
Forward-looking or left-context attention mask (left-to-right) generation with BertGeneration and RobertaForCausalLM 🤗Transformers	3	1351	October 27, 2020
SequenceClassification Head is finetuned for sentiment task? Models	0	350	December 8, 2021

Optimal methods to monitor attention matrices when doing training/inference using BERT-type models

Related topics