Understanding attention output from generate method in GPT model

winterb21 · November 8, 2023, 3:42pm

# Importing necessary modules
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Loading pre-trained GPT-2 tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Encoding input text
input_ids = tokenizer.encode("The dog is running", return_tensors='pt')

# Generating model output with attention information
output = model.generate(
    input_ids,
    max_length=6,
    num_return_sequences=1,
    no_repeat_ngram_size=2,
    output_attentions=True,
    return_dict_in_generate=True,
)

# Extracting attention tensors
attn = output.attentions

My observations are following.

The attn variable is a tuple with two items representing the number of new generated tokens (because 6 - 4 is 2).
Each item is a tuple of 12 tensors, corresponding to the number of layers in each GPT block.
The shape of the first tensor is [1, 12, 4, 4], and for the second tensor, it’s [1, 12, 1, 5].
When visualized, the tensor of shape [1, 12, 4, 4] represents masked attention.

Here are my questions.

What do tensors with shapes [1, 12, 4, 4] and [1, 12, 1, 5] represent? How are they different?
At what decoding stage these tensors come from?

Topic		Replies	Views
Returned Tensors and Hidden State Beginners	4	2653	September 5, 2020
Customizing GenerationMixin to output attentions Beginners	4	1823	September 10, 2020
How to decode GPT2 🤗Transformers	3	7770	June 17, 2022
Is attention_mask in LanguageModels such as GPT2LMHeadModel related to attention mechanism is it just to specify padding tokens Beginners	2	207	June 27, 2024
How to use generation of gpt2 from huggingface transformers in tensorflow keras model? Models	0	1109	June 8, 2021

Understanding attention output from generate method in GPT model

Related topics