Hi, all, I’m new to Bert. When I use Bert to calculate the word embedding for one sentence, If I specify the max_length and padding is True. Then I will get the embedding matrix of [1, max_length, 768]. such that:
tz = BertTokenizer.from_pretrained("bert-base-cased")
# The senetence to be encoded
sent = "deep learning!"
# Encode the sentence
encoded = tz.encode_plus(
text=sent, # the sentence to be encoded
add_special_tokens=True, # Add [CLS] and [SEP]
max_length = 10, # maximum length of a sentence
pad_to_max_length=True, # Add [PAD]s
return_tensors = 'pt', # ask the function to return PyTorch tensors
)
# {'input_ids': tensor([[ 101, 1996, 3776, 106, 102, 0, 0, 0, 0, 0]]),
# 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
# 'attention_mask': tensor([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])}
output0 = model(**encoded)[0] # size: [1, 10, 768]
tensor([[[-0.1983, 0.2368, -0.0717, ..., -0.2809, 0.0442, 0.0142],
[ 0.2063, -0.7768, -0.0193, ..., -0.4573, 0.1309, -0.1332],
[-0.3191, -0.9506, 0.1103, ..., -0.2109, 0.1079, -0.2463],
...,
[-0.1037, -0.0860, 0.3230, ..., -0.1607, 0.0107, -0.0919],
[-0.3363, -0.2425, 0.2199, ..., -0.0281, 0.0726, -0.0862],
[-0.2224, -0.1177, 0.3508, ..., -0.1924, 0.0387, 0.0186]]],
grad_fn=<NativeLayerNormBackward0>)
However, if I don’t specify the max_length and padding, then I will get the following embedding:
encoded = tz.encode_plus(
text=sent, # the sentence to be encoded
add_special_tokens=True, # Add [CLS] and [SEP]
max_length = 64, # maximum length of a sentence
pad_to_max_length=False, # Add [PAD]s
return_attention_mask = True, # Generate the attention mask
return_tensors = 'pt', # ask the function to return PyTorch tensors
)
# {'input_ids': tensor([[ 101, 1996, 3776, 106, 102]]),
# 'token_type_ids': tensor([[0, 0, 0, 0, 0]]),
# 'attention_mask': tensor([[1, 1, 1, 1, 1]])}
output1 = model(**encoded)[0] # size: [1, 5, 768]
tensor([[[-0.1983, 0.2368, -0.0717, ..., -0.2809, 0.0442, 0.0142],
[ 0.2063, -0.7768, -0.0193, ..., -0.4573, 0.1309, -0.1332],
[-0.3191, -0.9506, 0.1103, ..., -0.2109, 0.1079, -0.2463],
[-0.2446, -0.1497, -0.2388, ..., 0.3796, 0.3198, -0.1393],
[ 0.7812, 0.2023, -0.2524, ..., 0.1389, -0.7544, -0.2721]]],
grad_fn=<NativeLayerNormBackward0>)
I have 2 questions about it:
- Why padding position also has the word embedding? Does those padding position embedding meanful?
- If I want to calculate the similarity matrix between two sentences’ word embedding, should I mask the padding embedding to 0 and then calculate the similarity matrix? or how could I calculate the similarity matrix for two sentences that has padding embeddings? for example: