Difference in output logits when using a subsection of the input sentence

michaelas · January 8, 2023, 4:00pm

I am using GPT-2 to generate text from input sentences with a pre-trained DialoGPT model.

If my understanding is correct, due to the masked self-attention mechanism, it should be irrelevant whether a subsection starting from the beginning of a sentence or the whole sentence is used as input. The subsection should result in the same output as the whole sentence, only shorter.

However, while the outputs are almost identical, I observe differences of about e^-8 in many elements of the output logit tensors. With “output_attentions=True”, the attention scores are slightly different too. I generate the output in this way:

sentence = "I wonder what the model will do with this input sentence."
enc_sentence = tokenizer.encode(sentence, return_tensors="pt")
chatbot_input_ids = chatbot_model.prepare_inputs_for_generation(enc_sentence)
chatbot_output = chatbot_model(**chatbot_input_ids)
logits = chatbot_output.logits

When I use enc_sentence[:, :5] instead of enc_sentence as input, the logits[:, :5, :] are different. What could be the reason for this?

Topic		Replies	Views
Understanding attention output from generate method in GPT model Beginners	0	585	November 8, 2023
Logits from generate and model call different 🤗Transformers	2	857	January 26, 2025
How to convert model output logits into string sentences during training to check what the model is outputting? 🤗Transformers	3	5024	October 14, 2021
Chapter 2: Different logits for otherwise identical tokenization "pipelines" Course	1	291	April 29, 2024
Chapter 1 questions Course	104	22905	April 29, 2025

Difference in output logits when using a subsection of the input sentence

Related topics