GPT2Model model output inconsistency between different transformers versions

Possibly related this phenomenon.

Also, the part that has changed a lot recently is the KV cache-related area, which seems to have changed quite a bit.