Add SENet Blocks in Encoding Layers

:rocket: Feature Request

I read the article “SesameBERT: Attention for Anywhere” and would like to add SENet blocks in the Huggingface implementation. The article’s authors made an implementation with Tensorflow, but I would like to use the lib in pytorch.

Motivation

The use of SENet Blocks has obtained state-of-the-art results. And they seem to be promising in NLP.

Your contribution

I know that it is possible to modify the [BertLayer() and [BertEnconder()] classes

Any suggestions on how to modify the code so that you can apply the idea used in the article?