Question about query_length in modeling_t5.py

If the past_key_value is not None, the query_length induced from SelfAttention can be the length of past_key_value + hidden_states, which is actually wrong.

Luckily, this problematic behavior would not affect the EncDecAttetion since the position_bias are always zeros.

Correct me if I am wrong.