Reformer - attention data format

judicarta · December 9, 2020, 4:21pm

I am trying to analyze attention data from Reformer model. I use the following settings:
“lsh_attn_chunk_length”: 64,
“local_attn_chunk_length”: 64,

I expect attention (output) shape to be:
(batch_size, number_of_heads, number_of_chunks, 64, 64)
The result (shape of attention for each layer) I get is:
torch.Size([1, 2, 41, 64, 128])
the first 4 match what I expect. the last dimension is 128 instead of 64.

How do I interpret the last dimension? (128) ?

The output feature size of the attention layer is 128 - does it mean the output
(self_attention): LSHSelfAttention(
(query_key): Linear(in_features=256, out_features=128, bias=False)

then what about the 64 X 64? (for attention)

jayeshp · June 29, 2023, 7:28pm

I have the same issue. My params are:
lsh_attn_chunk_length = 64 # default value
local_chunk_length = 64 # default value
axial_pos_shape = (64, 99)
default values for the rest

The shape I get for attention[0] is torch.Size([1, 12, 99, 64, 128])
Should I assume that the attention is a vector of length 128 for each of the 99x64 embeddings?

Topic		Replies	Views
Recover the attention weights matrix with Reformer model 🤗Transformers	1	309	December 9, 2020
How To Change Output Shape Of Multi Head Self Attention Output To A Shape That Can Be Fed To Convolution Layer Beginners	0	18	July 17, 2024
What is the dimensionality of output_attentions? 🤗Transformers	0	464	July 9, 2022
Feature extraction output Beginners	0	410	March 12, 2022
Number of layers in Reformer model Intermediate	0	268	July 16, 2021

Reformer - attention data format

Related topics