What is ns, nd variable

jisng-prk · August 9, 2020, 8:16am

In transformers/src/transformers/modeling_gpt2.py .

what is nd, ns variable in line 150

def _attn(self, q, k, v, attention_mask=None, head_mask=None, output_attentions=False):
    w = torch.matmul(q, k)
    if self.scale:
        w = w / (float(v.size(-1)) ** 0.5)
    nd, ns = w.size(-2), w.size(-1)
    mask = self.bias[:, :, ns - nd : ns, :ns]

because gpt model performs self-attention, isn’t the nd and ns always same?
What is the meaning of “ns - nd: ns”

As you can see in line 187,

query, key, value = x.split(self.split_size, dim=2)

the query and key should have same sequence length which is nd and ns

Thank you

Topic		Replies	Views
GPT2 Implementation from scratch 🤗Transformers	0	400	August 11, 2020
Self-attention query vs key size in gpt2 🤗Transformers	1	1061	June 17, 2022
Sizes of Query, key and value vector in Bert Model 🤗Transformers	3	6061	March 25, 2021
Why reshaping attn_weights when outputting attentions? 🤗Transformers	0	310	April 13, 2021
Two transformers in one model 🤗Transformers	0	248	February 17, 2022

What is ns, nd variable

Related topics