Hugging Face Forums

Load dataset from a specific cache file

lhoestq February 26, 2024, 6:15pm 4

The “cache-9aaxxxxx” file should be the one indeed

Dataset.from_file should work - what takes time is reading the metadata of all the record batches (=chunks of arrow files). It doesn’t load the actual dataset content in memory.

Alternatively you can use IterableDataset.from_file which doesn’t read the metadata, but we haven’t implemented save_to_disk for IterableDataset

Topic		Replies	Views	Activity
Loading dataset from cache .arrow file 🤗Datasets	1	772	March 28, 2023
Best way to access the cached transformation arrow file 🤗Datasets	9	3170	January 19, 2024
[urgent]Can you reconstruct datasets using the cache file (.arrow file)? 🤗Datasets	5	1092	August 27, 2021
`load_from_cache_file` not working 🤗Datasets	1	2214	May 10, 2021
Loading Huge Image Dataset seems to take a lot of time 🤗Datasets	7	3791	May 16, 2022