Difference in output logits when using a subsection of the input sentence

I am using GPT-2 to generate text from input sentences with a pre-trained DialoGPT model.

If my understanding is correct, due to the masked self-attention mechanism, it should be irrelevant whether a subsection starting from the beginning of a sentence or the whole sentence is used as input. The subsection should result in the same output as the whole sentence, only shorter.

However, while the outputs are almost identical, I observe differences of about e^-8 in many elements of the output logit tensors. With “output_attentions=True”, the attention scores are slightly different too. I generate the output in this way:

sentence = "I wonder what the model will do with this input sentence."
enc_sentence = tokenizer.encode(sentence, return_tensors="pt")
chatbot_input_ids = chatbot_model.prepare_inputs_for_generation(enc_sentence)
chatbot_output = chatbot_model(**chatbot_input_ids)
logits = chatbot_output.logits

When I use enc_sentence[:, :5] instead of enc_sentence as input, the logits[:, :5, :] are different. What could be the reason for this?