Attention mask shape (custom attention masking)

In e.g. the LlamaModel docs it suggests that the attention_mask passed to forward should be 2 dimensional.

However looking at the source code, it looks like it is possible to provide a 4D mask, and this will override the standard (e.g. causal) mask.

Is this correct? Should this be documented? Is there anything to watch out for when doing this? (I’m interested in providing a custom attention masking pattern to the Llama architecture).

1 Like

Hi,
yes I passed a custom mask in 4D and it worked :slight_smile: (confirmed with the attention scores)

Relating to your question - did you perhaps find out how to answer this question regarding additive/binary attention mask?

1 Like

Hey,
I want to pass a custom mask, too. any chance you can help me out?
How did you pass it?
(Again, to override the standard causal mask)

Thanks in advance

1 Like

Hi @spranav1205,
so I am using LlamaForCausalML, here is my code snippet:

def generate_additive_attention_mask(no_masking_length, total_length): 
    # no_masking_length is the number of  tokens you don't need masking for
    mask = torch.tril(torch.ones(total_length, total_length)).to(device)
    mask[:no_masking_length, :no_masking_length] = 1
    mask = mask.unsqueeze(0).unsqueeze(0)  # Add batch and num_attention_heads dimensions
    mask = (1 - mask) * -1e9 # Because Llama2.7b-chat uses additive masking, set to -large number instead of 0, and set to 0 instead of 1.


inputs = tokenizer(msg)
seq_len = inputs["input_ids"].shape[1]
inputs["attention_mask"]=generate_additive_attention_mask(no_masking_length, seq_len) 
outputs = model(**inputs)
1 Like