How does cache work?

/downloads contains the downloaded data files and /imagenet-1k an .arrow file generated from them (the images are in JPEG, so it’s hard to compress them further in this conversion from TAR to Arrow). Hence, the total size is twice the original dataset’s size.

Deleting /downloads should work.

PS: Calling ds.cleanup_cache_files deletes all the dataset’s cached .arrow files besides ds.cache_files (the ones that are memory-mapped)

1 Like