Question about query_length in modeling_t5.py

GeneZC · April 18, 2022, 12:23pm

huggingface/transformers/blob/0a057201a96565df29984d716f660fd8d634329a/src/transformers/models/t5/modeling_t5.py#L464

      
        
            # Mask is (batch_size, key_length) (non-causal) or (batch_size, key_length, key_length)
            # past_key_value[0] is (batch_size, n_heads, q_len - 1, dim_per_head)
            batch_size, seq_length = hidden_states.shape[:2]
            
            
real_seq_length = seq_length
            
            
if past_key_value is not None:
                assert (
                    len(past_key_value) == 2
                ), f"past_key_value should have 2 past states: keys and values. Got { len(past_key_value)} past states"
                real_seq_length += past_key_value[0].shape[2] if query_length is None else query_length
            
            
key_length = real_seq_length if key_value_states is None else key_value_states.shape[1]
            
            
def shape(states):
                """projection"""
                return states.view(batch_size, -1, self.n_heads, self.key_value_proj_dim).transpose(1, 2)
            
            
def unshape(states):
                """reshape"""
                return states.transpose(1, 2).contiguous().view(batch_size, -1, self.inner_dim)

If the past_key_value is not None, the query_length induced from SelfAttention can be the length of past_key_value + hidden_states, which is actually wrong.

Luckily, this problematic behavior would not affect the EncDecAttetion since the position_bias are always zeros.

Correct me if I am wrong.

Topic		Replies	Views
Can not understand the sequence length and hidden size of the BEiT model 🤗Transformers	0	226	July 27, 2023
Self-attention query vs key size in gpt2 🤗Transformers	1	1045	June 17, 2022
Past_key_values - why not past_key_values_queries? Beginners	5	10901	October 15, 2023
Questions about the shape of T5 logits Beginners	4	2585	September 23, 2021
Sizes of Query, key and value vector in Bert Model 🤗Transformers	3	5937	March 25, 2021

Question about query_length in modeling_t5.py

Related topics