How to extract attention gradients in bert

ShivaniSri · April 16, 2022, 3:58pm

query, key and values are used for attention. In bert we can extract attentions of tokens by making output_attentions=True for each layer and each head. How to extract the attention gradient for each layer and each head.

Topic	Replies	Views
Extracting attention weights of summarization model Intermediate	439	August 12, 2021
Access and modify attention weights at runtime Beginners	2140	August 25, 2021
How to change BERT attention value during testing Intermediate	408	October 6, 2021
Can I compare the attention of different encoder layers? Beginners	206	December 13, 2022
How to frozen the attention map in BERT Intermediate	536	October 6, 2021

How to extract attention gradients in bert

Related topics