Pre-training: ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []

Hello,
I am following the tutorial here to pre-train a bert model and somehow I got the following errors:
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []

The code is very straightforward from the tutorial:

model = BertForMaskedLM.from_pretrained('bert-base-multilingual-uncased')
bert_tokenizer = BertTokenizerFast.from_pretrained("bert-base-multilingual-uncased")
streaming_dataset = load_dataset('text', data_files='./train.txt', streaming=True, split="train")

training_args = TrainingArguments(
    output_dir='/project/bert/model',
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    evaluation_strategy='steps',
    eval_steps=100,
    logging_steps=100,
    num_train_epochs=3,
    save_strategy='steps',
    save_steps=500,
    max_steps=1000,
)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=bert_tokenizer,
    mlm=True,
    mlm_probability=0.2
)

trainer = Trainer(
    model=model,
    tokenizer=bert_tokenizer,
    args=training_args,
    data_collator=data_collator,
    train_dataset=streaming_dataset,
)

Does anyone have idea on the reason for this?

You are loading the data directly, You need to encode the data before sending to the model. Looks like you are directly sending the data without any encoding to the model.