Reformer - attention data format

I am trying to analyze attention data from Reformer model. I use the following settings:
“lsh_attn_chunk_length”: 64,
“local_attn_chunk_length”: 64,

I expect attention (output) shape to be:
(batch_size, number_of_heads, number_of_chunks, 64, 64)
The result (shape of attention for each layer) I get is:
torch.Size([1, 2, 41, 64, 128])
the first 4 match what I expect. the last dimension is 128 instead of 64.

How do I interpret the last dimension? (128) ?

The output feature size of the attention layer is 128 - does it mean the output
(self_attention): LSHSelfAttention(
(query_key): Linear(in_features=256, out_features=128, bias=False)

then what about the 64 X 64? (for attention)

I have the same issue. My params are:
lsh_attn_chunk_length = 64 # default value
local_chunk_length = 64 # default value
axial_pos_shape = (64, 99)
default values for the rest

The shape I get for attention[0] is torch.Size([1, 12, 99, 64, 128])
Should I assume that the attention is a vector of length 128 for each of the 99x64 embeddings?