Attention mask and token ids

HI,
I am taking following wonderful course,
Transformes
While we do padding we pad the sequece with 0 and ask model not to consider the padding. I was wondering if there is some token with id = 0? Because in this case we will be avoiding a token with id = 0, which is not good. Could anybody please help me here.
Thank you very much.

1 Like

First, you’re right, we wouldn’t want to avoid real input.
That’s why we use a padding token.

There are different special tokens, such as the padding token, begin of sentence (BOS) token, end of sentence (EOS), unknown (unk) and more.
Eventually, since we’re working with vectors of numbers (tensors) every token has a token id corresponding to the token. Meaning, the special tokens are also embedded as numbers.
Usually the padding id correspond to 0, so when you pad with 0, you actually use the padding token, which is great :slight_smile: