AttributeError for Text Dataset For Next Sentence Prediction: no attribute 'documents'


I am preparing some data for BERT fine-tuning for Next Sentence Prediction.
In my understanding, the way to go is:

  1. Pass the data through the TextDatasetForNextSentencePrediction class
  2. Instantiate DataCollatorForLanguageModeling (as DataCollatorForNextSentencePrediction has been removed)
  3. Pass both to the Trainer

However, there seems to be some bug with the attribute documents to the class TextDatasetForNextSentencePrediction. Indeed, while for some runs the attribute is produced, for others I get AttributeError: 'TextDatasetForNextSentencePrediction' object has no attribute 'documents' reporting that the attribute is missing.
I also see a warning saying that soon enough this code will be replaced. I was thus wondering whether this is still the pipeline you suggest or I should go another way.

Below sample data and minimal code to reproduce the error.

Screenshot 2021-07-26 at 12.38.50

from import TextDatasetForNextSentencePrediction
from transformers import BertTokenizerFast

tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')
train_dataset = TextDatasetForNextSentencePrediction(

docs = train_dataset.documents