Doubts about attention masks

Hi. My issue relates to this issue. I wanted to have a graph-like input and was trying to mimic the behaviour of graph neural networks by masking the tokens (nodes) outside of the neighbourhood. Specifically, if I had for example a graph like I -> am -> hungry, and the mapping for tokens was one to one, I would like to have an attention mask like: [[1, 1, 0], [0, 1, 1], [0, 0, 1]], meaning the token I would attend to itself and the token am, but not hungry since it is outside its neighbourhood.
Is this behaviour possible? I have been searching and found the get_extended_attention_mask function. It does work to send a 3D attention mask to the model. However, this line says that the 3D attention should be (batch, from_seq_len, to_seq_len) which suggests that this mask is for cross attention.
Are attention_masks for self-attention or cross-attention?