Is there any mothed speed generate examples

Hello, Is there any mothed speed generate examples? I have almost 60,000,000 examples in my corpus, and It seems takes almost several hours to load complete.

Hi ! This step is single-processed right now, once we add parallelism it will be much faster :slight_smile:

In the meantime your can try splitting your dataset into smaller ones, and then use concatenate_datasets()