NSP + WWM raises error when training BertForPreTraining

Hi,
I was trying to figure out how whole word masking effects to BERT.
So I used TextDatasetForNextSentencePrediction with DataCollatorForWholeWordMask.
But it gave me the following error during training phase:

/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getitem__(self, k)
   1614         if isinstance(k, str):
   1615             inner_dict = {k: v for (k, v) in self.items()}
-> 1616             return inner_dict[k]
   1617         else:
   1618             return self.to_tuple()[k]

KeyError: 'loss'

So I looked it up it happens when you don’t pass appropriate labels.

Then I dig deeper and it turns out DataCollatorForWholeWordMask wasn’t handling the special_tokens the same way DataCollatorForLanguageModeling did.

Is there any other way to combine whole word masking and next sentence prediction ?
Also since DataCollatorForWholeWordMask inherits from DataCollatorForLanguageModeling, isn’t it supposed to be replaceable for DataCollatorForLanguageModeling ?

Dataset Creation
dataset = TextDatasetForNextSentencePrediction(
    tokenizer=bert_cased_tokenizer,
    file_path="./tmp.txt",
    block_size = 256
)
Data Collator
data_collator = DataCollatorForWholeWordMask(
    tokenizer=bert_cased_tokenizer, 
    mlm=True,
    mlm_probability= 0.15
)
Training
training_args = TrainingArguments(
    output_dir=PATHS["model"]["cased"]["mlm-nsp"]["training"]["local"],
    overwrite_output_dir=True,
    num_train_epochs=2,
    per_gpu_train_batch_size= 16
    save_steps=10_000,
    save_total_limit=2,
    prediction_loss_only=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)

trainer.train()