Hello,
I am trying to implement a paper with the following approach :
The idea is to fine-tune a BERT model for a language modeling task (next token prediction), with a Triangular Mask in order to enforce left-to-right language modeling.
I would like to know if it’s possible to fine-tune a BERT with a triangular mask from a BERT pre-trained with a square mask ? If so, how to do it in the implementation ?
Is there a simple way to do it ? Or do I need to modify the source code ?
Thank you so much for your help !