The “cache-9aaxxxxx” file should be the one indeed
Dataset.from_file
should work - what takes time is reading the metadata of all the record batches (=chunks of arrow files). It doesn’t load the actual dataset content in memory.
Alternatively you can use IterableDataset.from_file
which doesn’t read the metadata, but we haven’t implemented save_to_disk
for IterableDataset