Fine-tune BERT with triangular mask

Hello,

I am trying to implement a paper with the following approach :

image

The idea is to fine-tune a BERT model for a language modeling task (next token prediction), with a Triangular Mask in order to enforce left-to-right language modeling.

I would like to know if it’s possible to fine-tune a BERT with a triangular mask from a BERT pre-trained with a square mask ? If so, how to do it in the implementation ?

Is there a simple way to do it ? Or do I need to modify the source code ?

Thank you so much for your help !