[ Dataset.from_generator ] Prevent caching during upload

Is it possible to disable caching when using Dataset.from_generator (docs here)?

I like the method because it doesn’t require me to keep the entire dataset in memory. However, it’s caching so much that I run out of disk before my upload is complete.

A Dataset always caches the full dataset on disk. If you want to stream the data you should use an IterableDataset.

However IterableDataset doesn’t implement push_to_hub() yet, it would be amazing to add this method for cases like yours

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.