Creating dataset slow

I don’t have much experience with datasets libraries, so please consider this as a reference only.:sweat_smile:
from_ is simple and convenient, but I don’t think it’s suitable for creating large datasets. Unless you add more RAM to your PC…
On the other hand, I think the method of creating on disk and then loading at the end, or writing a script dedicated to loading, is suitable for creating huge datasets.

When reading the dataset, IterableDataset (streaming) should be available.
This saves RAM by reducing the amount of data loaded at once. I don’t know how to use it when creating a dataset.