In distilbart, can i identify the weight of the words in the sequence associated to the candidate label/class. I want to narrow down on the reason for the model assigning a particular score to a given class. For example if “This is awesome anyone thinking to buy should blindly go for it” is assigned a positive label score of 0.99, then is it possible to identify the words in the sequence which carry the most weight/ contribute the most to the positive label, ( in this sequence the words/phrases - awesome, blindly go for it ) and the relative weight(cosine similarity/distance) of those words to the identified class(i.e positive).
Can this be done by accessing or manipulating the end layers of the model or by any other method?
Thank you for your help in advance!