Returned Tensors and Hidden State

Hi, just quickly getting started with GPT2.

from :

from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

is said to yield the features of the text.
Upon inspecting the output, it is an irregularly shaped tuple with nested tensors. Looking at the source code for GPT2Model, this is supposed to represent the hidden state. I can guess what some of these dimensions represent, for example the 768 dimension is obviously the word embedding, but in general I can’t find any documentation about interpreting the information in output

I also tried adding:
output = model(**encoded_input, output_attentions = True)
but I do not know how to interpret the dimensions of this either.
I am told to “See attentions under returned tensors for more detail.” in the docstring at

But I cannot find what this is referring to. Can someone help me interpret the dimensions of these nested tuples?

Please refer to GPT2 docs. It will give a detailed description of what GPT2Model is supposed to return.

It returns (in order of output):

  • last_hidden_state : (batch_size, sequence_length, hidden_size)
  • past: (2, batch_size, num_heads, sequence_length, embed_size_per_head)
  • hidden_states
  • attentions
1 Like

Can’t believe I missed this

Hey @azhx,

if you are on master then you can also use the ModelOutput object, which is a dict like object which lets you access the output as out.attentions, out.hidden_states etc. For GPT2Model model, it returns BaseModelOutputWithPast. You can find the docs here

1 Like

Yes, I’m on master now and using this, thanks!