Swapping out self-attention layer in BERT

sidkk86 · January 11, 2023, 7:44pm

Hi team, I am looking to swap out the self attention layer in the BERT construction, and just retrain the embeddings with all other parts as is. I basically want to swap out these 20 lines.

Is it possible for me to write my own self attention module, keep everything else the same and retrain the BERT embeddings? (I have high confidence that it is, but looking hopefully for instant gratification than sifting through 1000s of lines of code :D. Ideally, I think I would write my own module like this one and just wire it into the current pipeline ) Just scoping out the effort for this

Topic		Replies	Views
Modification of self attention in BERT without pretraining Research	1	362	June 15, 2023
BertSelfAttention, BertSelfOutput implementation 🤗Transformers	4	715	August 11, 2022
Training a model with custom attention masks in each layer 🤗Transformers	0	667	December 6, 2023
Can I use a custom attention layer while still leveraging a pre-trained BERT model? 🤗Transformers	4	24	July 8, 2025
Remove a named module from a pre-trained model 🤗Transformers	0	246	April 12, 2024

Swapping out self-attention layer in BERT

Related topics