Hi team, I am looking to swap out the self attention layer in the BERT construction, and just retrain the embeddings with all other parts as is. I basically want to swap out these 20 lines.
Is it possible for me to write my own self attention module, keep everything else the same and retrain the BERT embeddings? (I have high confidence that it is, but looking hopefully for instant gratification than sifting through 1000s of lines of code :D. Ideally, I think I would write my own module like this one and just wire it into the current pipeline ) Just scoping out the effort for this