Modification of self attention in BERT without pretraining

M-a-j-a · May 19, 2023, 8:58am

Hello!

I need to turn bidirectional self attention layer into unidirectional one in BERT - from what I understood I just need to apply so called attention mask triangle to the matrix with the attention scores in the source code. However, in this case, before usage of model I need to pretrain it and this is a problem due to limited resources. Do you have any idea how to modify attention without changing the source code?

Thank you in advance,

katarinayuan · June 15, 2023, 9:21pm

Interested in the question too:)

Topic		Replies	Views
Swapping out self-attention layer in BERT Research	0	569	January 11, 2023
Training a model with custom attention masks in each layer 🤗Transformers	0	667	December 6, 2023
Can I use a custom attention layer while still leveraging a pre-trained BERT model? 🤗Transformers	4	24	July 8, 2025
Forward-looking or left-context attention mask (left-to-right) generation with BertGeneration and RobertaForCausalLM 🤗Transformers	3	1351	October 27, 2020
Bert attention mask question 🤗Transformers	4	1203	March 11, 2024

Modification of self attention in BERT without pretraining

Related topics