Attention mask and token ids

bhavinmoriya · March 1, 2022, 11:23pm

HI,
I am taking following wonderful course,
Transformes
While we do padding we pad the sequece with 0 and ask model not to consider the padding. I was wondering if there is some token with id = 0? Because in this case we will be avoiding a token with id = 0, which is not good. Could anybody please help me here.
Thank you very much.

IdoAmit198 · October 18, 2022, 2:37pm

First, you’re right, we wouldn’t want to avoid real input.
That’s why we use a padding token.

There are different special tokens, such as the padding token, begin of sentence (BOS) token, end of sentence (EOS), unknown (unk) and more.
Eventually, since we’re working with vectors of numbers (tensors) every token has a token id corresponding to the token. Meaning, the special tokens are also embedded as numbers.
Usually the padding id correspond to 0, so when you pad with 0, you actually use the padding token, which is great

Topic		Replies	Views
Seq2seq padding 🤗Transformers	1	69	October 10, 2024
The (hidden) meaning behind the embedding of the padding token? Awesome paper	2	6283	July 14, 2021
Why can padding tokens attend to other tokens in masked self attention? 🤗Transformers	0	69	November 4, 2024
Pad Tokens & Attention Masks with Data Collators 🤗Transformers	0	57	August 29, 2024
Difference between setting label index to -100 & setting attention mask to 0 🤗Transformers	5	2969	March 17, 2021

Attention mask and token ids

Related topics