Hi, I am experimenting with some ideas of adding different attention masks for different transformer layers. Currently, I am trying to implement a customized model that accepts two attention masks. But I’m wondering if there’s any simpler method to do so? Any suggestion is welcome! Thanks!