Recover the attention weights matrix with Reformer model


When using the chunked self attention layer in Reformer, the attention weight matrix has got a shape which is different than using global self attention. The documentation doesn’t give any information about this so I dig into the code to better understand why. It seems to be related chunk mechanism.
However, I struggled to recover the equivalent attention weight matrix as in the classical global attention layer.

Does anyone has any idea how to do such thing ?

Global attention: attention weight shape (batch_size, num_heads, sequence_length, sequence_length)
Chunked attention: attention weight shape (batch_size, num_heads, sequence_length, num_chunk, attn_chunk_length, attn_chunk_length * (1 + num_chunks_before + num_chunks_after)


did you ever figure this out? I am trying to do the same - recover attention data from Reformer Classification Model.

I expect (batch_size, num_heads, num_chunks, seq_chunk_size X seq_chunk_size) but get (batch_size, num_heads, num_chunks, seq_chunk_size, 2 X seq_chunk_size)

Thank you