Pad Token and attention mask. What is the difference?

drdraze · August 13, 2021, 11:02am

Hi there.

I apologise if the answer to this is in the docs (if so I cannot find it) but I am training BigBird from scratch and I have noticed that the BigBirdConfig has an argument for a pad token (pad_token_id) and the model itself also has an argument for an attention mask on its forward pass (attention_mask).

If I supply the id of my pad token, will my model generate an attention mask automatically to mask the padded input? Or is it also necessary to provide the attention mask for each input sequence?

Thanks

Topic		Replies	Views
Do automatically generated attention masks ignore padding? 🤗Transformers	4	16444	March 8, 2022
Is the attention mask and tokenization taken into account? Beginners	0	349	December 7, 2021
Attention mask and token ids Awesome paper	1	2262	October 18, 2022
Is T5 expected to ignore padding tokens in `decoder_input_ids` when `decoder_attention_mask` is not provided 🤗Transformers	4	2687	April 5, 2023
LLaMA2 - tokenizer padding affecting logits (even with attention_mask) 🤗Transformers	8	4537	March 26, 2024

Pad Token and attention mask. What is the difference?

Related topics