I have a question about the sizes of query, key and value vectors. As mentioned in this paper and also demonstrated in this medium, we should be expecting the sizes of query, key and value vectors as [seq_length x seq_length]. But when I print the sizes of the parameter like below, I see the sizes of those vectors as [768 x 768].
for name, param in model.named_parameters(): print(name, param.size()) >>> bert.bert.encoder.layer.0.attention.self.query.weight torch.Size([768, 768]) bert.bert.encoder.layer.0.attention.self.key.weight torch.Size([768, 768]) bert.bert.encoder.layer.0.attention.self.value.weight torch.Size([768, 768])
I am really confused. I feel like I am missing something, could someone please help me figure it out?