I’m trying to fine-tune a transformer model on a simultaneous MLM + NSP task. I’ve some older code examples that show that there used to be a
DataCollatorForNSP class that could be relied on in this configuration. However, this class no longer exists and has been replaced by
DataCollatorForLanguageModelling, as stated in this issue: Why was DataCollatorForNextSentencePrediction removed ? · Issue #9416 · huggingface/transformers · GitHub
I’m nevertheless a bit confused because the source code for the
DataCollatorForLanguageModelling class shows no parameters for controlling the amount of NSP while there is a float value for how much masked words should the training involve.
I was wondering whether someone could give me a clearer picture of this class and how to involve Next Sentence Prediction as an auxiliary task during a MLM training.
Thanks a lot !