What is the dimensionality of output_attentions?

jmboesen · July 9, 2022, 3:54pm

Newbie here, but I’m working on a GPT-2 project and trying to visualize the attentions. The docs say the attention output is a

Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length) .

That sounds like a 1-D tuple of 4-D tensors, but my output returns a 2-D tuple of 4-D tensors. What is the second dimension of the tuple?

This is my code, nothing special:

def generate_text_beam(encoded_seq, num_results = 1):
  # keys: sequences, attentions
  outputs_dict = model.generate(
    encoded_seq,
    num_beams=num_results,
    max_length=200,
    num_return_sequences=num_results,
    no_repeat_ngram_size=1,
    remove_invalid_values=True,
    output_attentions=True,
    return_dict_in_generate=True,
  )
  text = [tokenizer.decode(prediction,
    skip_special_tokens=True,
    output_attentions=True
    ) for prediction in outputs_dict['sequences']]
  return (text, outputs_dict['attentions'])

# ...

generated_text, attentions = generate_text_beam(encoded_seq, 1)

Topic		Replies	Views
Google T5 cross_attentions output Models	0	41	August 29, 2024
Understanding attention output from generate method in GPT model Beginners	0	619	November 8, 2023
T5: why do we have more tokens expressed via cross attentions than the decoded sequence? Intermediate	1	386	February 21, 2023
Customizing GenerationMixin to output attentions Beginners	4	1826	September 10, 2020
Reformer - attention data format Intermediate	1	399	June 29, 2023

What is the dimensionality of output_attentions?

Related topics