Is it okay to use CausalLM with zero attention values?

soumyasanyal08 · June 4, 2024, 8:18pm

Hi,

I’m trying to understand what happens when we call LlamaForCausalLM model with attention_mask = [1, 1, 1, 1, 0, 0, 0, 0]. For the 5th index (using 0-indexing), would it have teacher-forcing internally? For instance, what’s the internal differences in using above attention_mask vs. attention_mask_2 = [1, 1, 1, 1, 1, 0, 0, 0]?

Does it make sense to call causal models with zeros in the attention mask?

Topic		Replies	Views
Does Llama-2 use additive attention masking? 🤗Transformers	0	63	February 12, 2025
Quick question on attention masking in transformer models Models	0	125	January 8, 2025
Remove causal mask from Llama decoder Intermediate	5	703	October 22, 2024
Is attention_mask in LanguageModels such as GPT2LMHeadModel related to attention mechanism is it just to specify padding tokens Beginners	2	206	June 27, 2024
SDPA attention in e.g. Llama does not use fused accelerations 🤗Transformers	0	826	March 5, 2024

Is it okay to use CausalLM with zero attention values?

Related topics