Doubts about attention masks

AfonsoSousa · January 11, 2023, 10:46am

Hi. My issue relates to this issue. I wanted to have a graph-like input and was trying to mimic the behaviour of graph neural networks by masking the tokens (nodes) outside of the neighbourhood. Specifically, if I had for example a graph like I -> am -> hungry, and the mapping for tokens was one to one, I would like to have an attention mask like: [[1, 1, 0], [0, 1, 1], [0, 0, 1]], meaning the token I would attend to itself and the token am, but not hungry since it is outside its neighbourhood.
Is this behaviour possible? I have been searching and found the get_extended_attention_mask function. It does work to send a 3D attention mask to the model. However, this line says that the 3D attention should be (batch, from_seq_len, to_seq_len) which suggests that this mask is for cross attention.
Are attention_masks for self-attention or cross-attention?

basujindal · June 29, 2024, 8:34pm

HuggingFace is still working on it. There is a PR to support passing custom attention masks. Allow passing 2D attention mask · Issue #27640 · huggingface/transformers · GitHub

Topic		Replies	Views
Why can padding tokens attend to other tokens in masked self attention? 🤗Transformers	0	68	November 4, 2024
Specify different attention masks for different layers 🤗Transformers	0	220	January 16, 2023
Training a model with custom attention masks in each layer 🤗Transformers	0	667	December 6, 2023
Different masks for encoder self and cross attention 🤗Transformers	0	1099	November 8, 2022
Clarification on the attention_mask 🤗Transformers	4	23447	May 3, 2024

Doubts about attention masks

Related topics