How to set multiple files in `LineByLineTextDataset`?

In this Colab notebook, “01_how-to-train.ipynb”, when defining the training dataset, it uses only one file_path, as shown below:

dataset = LineByLineTextDataset(

How can I give multiple files to this dataset? Thank you!

In How to train a new language model from scratch using Transformers and Tokenizers tutorial, it defines a new class and uses the following line

src_files = Path("./data/").glob("*-eval.txt") if evaluate else Path("./data/").glob("*-train.txt")

to achieve this. But this tutorial does not leverage the new Trainer approach as in the Colab notebook.

1 Like