Is it okay to use CausalLM with zero attention values?

Hi,

I’m trying to understand what happens when we call LlamaForCausalLM model with attention_mask = [1, 1, 1, 1, 0, 0, 0, 0]. For the 5th index (using 0-indexing), would it have teacher-forcing internally? For instance, what’s the internal differences in using above attention_mask vs. attention_mask_2 = [1, 1, 1, 1, 1, 0, 0, 0]?

Does it make sense to call causal models with zeros in the attention mask?