Hi,
The Transformers library is not really aimed for this use case. It’s not meant to be a modular toolbox but rather aimed at people who want to use and fine-tune pre-trained models.
Of course, you could fork the Transformers library and tweak modeling_bert.py yourself, but if you want to hack around I’d recommend checking out other projects such as the ones from Phil Wang or GitHub - facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction..