Modification of self attention in BERT without pretraining


I need to turn bidirectional self attention layer into unidirectional one in BERT - from what I understood I just need to apply so called attention mask triangle to the matrix with the attention scores in the source code. However, in this case, before usage of model I need to pretrain it and this is a problem due to limited resources. Do you have any idea how to modify attention without changing the source code?

Thank you in advance,

Interested in the question too:)