What is the dimensionality of output_attentions?

Newbie here, but I’m working on a GPT-2 project and trying to visualize the attentions. The docs say the attention output is a

Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length) .

That sounds like a 1-D tuple of 4-D tensors, but my output returns a 2-D tuple of 4-D tensors. What is the second dimension of the tuple?

This is my code, nothing special:

def generate_text_beam(encoded_seq, num_results = 1):
  # keys: sequences, attentions
  outputs_dict = model.generate(
    encoded_seq,
    num_beams=num_results,
    max_length=200,
    num_return_sequences=num_results,
    no_repeat_ngram_size=1,
    remove_invalid_values=True,
    output_attentions=True,
    return_dict_in_generate=True,
  )
  text = [tokenizer.decode(prediction,
    skip_special_tokens=True,
    output_attentions=True
    ) for prediction in outputs_dict['sequences']]
  return (text, outputs_dict['attentions'])

# ...

generated_text, attentions = generate_text_beam(encoded_seq, 1)