Outputting gradients with respect to Attentions from a trained model?

amorystover · July 12, 2022, 4:11pm

Hi, this might not be possible, but I am looking for a way to get the gradients of the logits with respect to the attentions (on a per head and per layer basis).

Essentially, I am looking for dy/dA where “y” is the logit output and “A” is a self attention layer in a particular head. The model has already been trained/fine-tuned.

Is this possible to do?

Thanks!

Topic		Replies	Views
Evaluatation of the gradients of class probabilities and logits with respect to attention layer and hidden states 🤗Transformers	0	360	October 30, 2023
How to access raw attention logits? Beginners	2	641	March 11, 2024
Getting output attentions for encoder_attention decoder layers 🤗Transformers	0	352	October 24, 2020
[Announcement] GenerationOutputs: Scores, Attentions and Hidden States now available as outputs to generate 🤗Transformers	1	4604	January 13, 2021
Individually Logging All The Layer/Neuron Outputs Research	0	445	December 1, 2022

Outputting gradients with respect to Attentions from a trained model?

Related topics