Load dataset from a specific cache file

The “cache-9aaxxxxx” file should be the one indeed :slight_smile:

Dataset.from_file should work - what takes time is reading the metadata of all the record batches (=chunks of arrow files). It doesn’t load the actual dataset content in memory.

Alternatively you can use IterableDataset.from_file which doesn’t read the metadata, but we haven’t implemented save_to_disk for IterableDataset