Different masks for encoder self and cross attention

tarush14 · November 8, 2022, 5:15pm

Hi, I was looking through the source code for the T5 model and I realized that the decoder uses the same encoder_attention_mask (for cross attention) as the input attention mask supplied for self attention in the encoder. Is it possible to supply different masks for these two cases as I am trying to build a knowledge mechanism that requires this.

Thank you.

Topic		Replies	Views
Where does causal mask get generated for T5 decoder? Beginners	2	656	January 9, 2024
Self-attention masking for T5 encoder? 🤗Transformers	0	1702	February 27, 2022
Specify different attention masks for different layers 🤗Transformers	0	220	January 16, 2023
Difference between transformer encoder and decoder Models	1	11797	March 12, 2021
Replace Causal Mask of T5 to custom mask Models	3	415	October 29, 2024

Different masks for encoder self and cross attention

Related topics