Hi, I was looking through the source code for the T5 model and I realized that the decoder uses the same encoder_attention_mask (for cross attention) as the input attention mask supplied for self attention in the encoder. Is it possible to supply different masks for these two cases as I am trying to build a knowledge mechanism that requires this.
Thank you.