Distilbart-mnli-12-9

thomasdaryl · January 5, 2021, 9:51am

@valhalla
In distilbart, can i identify the weight of the words in the sequence associated to the candidate label/class. I want to narrow down on the reason for the model assigning a particular score to a given class. For example if “This is awesome anyone thinking to buy should blindly go for it” is assigned a positive label score of 0.99, then is it possible to identify the words in the sequence which carry the most weight/ contribute the most to the positive label, ( in this sequence the words/phrases - awesome, blindly go for it ) and the relative weight(cosine similarity/distance) of those words to the identified class(i.e positive).

Can this be done by accessing or manipulating the end layers of the model or by any other method?
Thank you for your help in advance!

valhalla · January 6, 2021, 9:08am

@joeddav might have some ideas here.

joeddav · January 6, 2021, 2:37pm

You can look at the attention weights. This is not exactly “explanation”, but also not entirely not explanation either That is, it can give you some notion of how “important” a word is, but is very limited in its explanatory power. Models like these are difficult to interpret.

Pass output_attentions=True to get the attention patterns back when you call the model. You’ll probably want the attention between each word and the EOS token since this embedding is the sequence representation that BartForSequenceClassification uses. But which layers and heads are best to use, or how to aggregate their patterns, goes beyond my knowledge.

As an aside for any interested readers, we have an awesome tool for visualizing transformers which is really helpful to build intuition about attention patters in transformers.

thomasdaryl · January 16, 2021, 3:04pm

Hi thank you for your response, ill try to take a deep dive into those that seems like a good place to start.

thomasdaryl · January 17, 2021, 2:25pm

Hi there is a library released by Pytorch for Model interpretability. It uses the internal embeddings and integrated gradient functions to give the saliency features and attributions of the input words. They have shown its applications for BERT model trained on SQuaD for Question Answering. Is it possible to do the same on these BART models? I am not really well versed with how the pipeline works, is it possible to run it on the distilbart zero-shot models? I am presuming I might have to replicate the zero-shot pipeline and pass the model through their interpretation functions.

joeddav · January 19, 2021, 4:50pm

I don’t have a complete answer for you, but don’t worry about the pipeline itself as its just a wrapper for the model. I assume this code is designed for BertModel, so the hardest thing you’ll probably need to do is adapt it for BartModel or BartForSequenceClassification.

Topic		Replies	Views
How to get weights indicating the importance of each words in a sentence corresponding to the label Beginners	2	1281	September 12, 2020
Predictions for sequenceclassification task Beginners	2	1256	October 9, 2020
Restricting BERT scores; Methods to counter high confidence in classification of short non-word-like-phrases to labels Beginners	0	467	May 27, 2021
Using Attention matrix to explain a classification problem? Models	0	641	March 25, 2022
Sentiment Analysis keywords 🤗Transformers	4	1997	April 9, 2023

Distilbart-mnli-12-9

Related topics