Same PAD Position but Different PAD Embedding

cor3ntino · March 18, 2021, 10:58am

Hi everyone,

I am asking my question here since I couldn’t find an answer to it anywhere.

I am a junior NLP engineer and I am experimenting some trouble with Bert Model, and more especially with its returned embeddings.

I am concerned about the fact that PAD embeddings are not the same.
I have seen some forums where it is explained that this is due to the fact that their embedding directly depend on the positional encoding; which I agree with. Nevertheless, for two simple sentences like "hello, I am a boy’ and “hello, I am a girl” in a batch of other longer sentences, these sentences would be padded and their pads would have the exact same positional encoding; yet, the pad embedding still differ in the end, even with the model in .eval() mode. It can’t be due to attention layers because of attention masks, it can’t be due to randomness of dropout since I turned it down, and it can’t be due to positional encoding because of the fact that they have exact same position for two different sentences.

Would anybody have an answer to my concerns? I understand that I can just “ignore” the pad embeddings if I just want information about the word embeddings, but I still would like to understand.

Have a nice day,
Thank you!
Coco

cor3ntino · March 23, 2021, 2:37pm

Up please! Anybody has an idea? That fact should annoy more than one person here.

Topic		Replies	Views
The (hidden) meaning behind the embedding of the padding token? Awesome paper	2	6289	July 14, 2021
Question about Bert padding part when calcualting similarity matrix Beginners	2	688	May 13, 2022
BERT Model predicting 'PAD' for NER Beginners	0	597	November 11, 2021
`BertEmbeddings` contains positional embedding? 🤗Transformers	2	3134	December 27, 2022
Padded sequences in language model (like BERT) with LSTM on top Beginners	0	360	September 9, 2022

Same PAD Position but Different PAD Embedding

Related topics