Caching progress of Dataset.from_generator

I’m using Dataset.from_generator to build large datasets.

Assuming the builder writes incrementally to disk as the dataset is constructed, is there a way to automatically resume in case of an error that disrupts construction?

Hi ! It’s not currently possible :confused:

Maybe you can create multiple Dataset objects, this way if one crashes the others can continue

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.