Approach to get info about word importance

I am looking for a holistic way of calculating word importance such as this one but the approach should be well representative of all heads and layers or should cover the majority of them. I’ve tried heatmaps, but special tokens such as CLS makes things hard to interpret about whats happening with other tokens. Also in the pic, I tried using captum, it doesn’t seem to extend well for my MultipleChoiceModel variants. Had issues with LayerConductance. Does anyone have a good way/implementation that can suggest and help in visualizing which tokens are primarily responsible for providing the right option.