Optimizing Disk Usage for Large (Audio) Datasets

I’m not familiar with the datasets library, but I wonder if iter_archive could be used?