I read the article “SesameBERT: Attention for Anywhere” and would like to add SENet blocks in the Huggingface implementation. The article’s authors made an implementation with Tensorflow, but I would like to use the lib in pytorch.
The use of SENet Blocks has obtained state-of-the-art results. And they seem to be promising in NLP.
I know that it is possible to modify the [BertLayer() and [BertEnconder()] classes
Any suggestions on how to modify the code so that you can apply the idea used in the article?