Recover the attention weights matrix with Reformer model

cccwam · July 29, 2020, 3:47pm

Hi,

When using the chunked self attention layer in Reformer, the attention weight matrix has got a shape which is different than using global self attention. The documentation doesn’t give any information about this so I dig into the code to better understand why. It seems to be related chunk mechanism.
However, I struggled to recover the equivalent attention weight matrix as in the classical global attention layer.

Does anyone has any idea how to do such thing ?

Global attention: attention weight shape (batch_size, num_heads, sequence_length, sequence_length)
Chunked attention: attention weight shape (batch_size, num_heads, sequence_length, num_chunk, attn_chunk_length, attn_chunk_length * (1 + num_chunks_before + num_chunks_after)

Thanks

judicarta · December 9, 2020, 4:14pm

did you ever figure this out? I am trying to do the same - recover attention data from Reformer Classification Model.

I expect (batch_size, num_heads, num_chunks, seq_chunk_size X seq_chunk_size) but get (batch_size, num_heads, num_chunks, seq_chunk_size, 2 X seq_chunk_size)

Thank you

Topic		Replies	Views
Reformer - attention data format Intermediate	1	399	June 29, 2023
Understanding what went wrong in attention Research	5	1650	July 31, 2020
Self-attention extraction from Long T5 🤗Transformers	0	245	March 5, 2024
How to visualize attention of a large encoder-decoder transformer model that isn't a model on hugging face? 🤗Transformers	0	2317	June 28, 2021
Using Attention matrix to explain a classification problem? Models	0	641	March 25, 2022

Recover the attention weights matrix with Reformer model

Related topics