I am pretraining with MLM + NSP (BertForPretraining
), and finetuning with NSP (BertForNextSentencePrediction
).
Is there an elegant way which I can keep the NSP head from the pretrained model?
Thanks!
I am pretraining with MLM + NSP (BertForPretraining
), and finetuning with NSP (BertForNextSentencePrediction
).
Is there an elegant way which I can keep the NSP head from the pretrained model?
Thanks!
Validated that using BertForNextSentencePrediction
does indeed keep the weights of NSP head of BertForPreTraining