Role of attention mask in base Bert

savasci · December 22, 2022, 7:48pm

Hello everyone,

I can’t understand the role of the attention_mask hyperparameter in transformers.BertModel. I mean, when we pad a sequence, the algorithm adds zeros to the end of the sentence (and zero is specific to padding). Why should we specify attention_mask while zero amounts are obviously paddings (and even will be deleted during matrix multiplications of encoder layers)?

Topic		Replies	Views
Clarification on the attention_mask 🤗Transformers	4	23745	May 3, 2024
Bert attention mask question 🤗Transformers	4	1212	March 11, 2024
Is attention_mask implemented correctly in BERT? 🤗Transformers	2	2577	November 12, 2023
Is attention_mask in LanguageModels such as GPT2LMHeadModel related to attention mechanism is it just to specify padding tokens Beginners	2	208	June 27, 2024
Why can padding tokens attend to other tokens in masked self attention? 🤗Transformers	0	72	November 4, 2024

Role of attention mask in base Bert

Related topics