Getting explanation for BERT classifications

So I’ve been using BERT models to do text classification and it’s all going great, but I was jut wondering if there was any way to have a BERT model provide some sort of explanation for the classification it makes?

So for example say that text has been classified as A rather than B, can we then get back a heatmap of what part of the text, or what relations in the text, or what words in the text, contributed to the text being more likely an A than a B?

And indeed separately from an individual classification is there some way to visualize the overall kinds of things that the BERT model is focused on for a given data set (assuming that we’ve tweaked it on some dataset) or some kind of summary of the key features it is relying on in the larger dataset that it was originally trained on?

Many thanks in advance

I have found this blog Visualize BERT sequence embeddings: An unseen way | by Tanmay Garg | Towards Data Science and this technical paper Visualizing and Measuring the Geometry of BERT | DeepAI - I’ll need to spend some time digesting them to see if it’s what I’m thinking of