Outputting gradients with respect to Attentions from a trained model?

Hi, this might not be possible, but I am looking for a way to get the gradients of the logits with respect to the attentions (on a per head and per layer basis).

Essentially, I am looking for dy/dA where “y” is the logit output and “A” is a self attention layer in a particular head. The model has already been trained/fine-tuned.

Is this possible to do?