transformes.LineByLineTextDataset
is deprecated, and the deprecation message suggests taking a look at the transformers/examples/pytorch/language-modeling/run_mlm.py at main 路 huggingface/transformers 路 GitHub script for the ways to preprocess the data.
So, you can use datasets.load_from_disk
to load the dataset and then apply transforms from the linked script to it (.map
calls) before passing it to Trainer
.