Datasets not using the cache dir

Hi !
I have a really simple script:

import datasets
import os

dset = datasets.load_dataset("imagenet-1k", cache_dir=os.environ["DATASET_STORE"])

which should use a cached version of the Imagenet1k dataset at $DATASET_STORE, which exists:


And as the screen shows, it always tries to download the dataset online rather than using the cache.
Am I doing it wrong ? Thanks !

The datasets’ cache structure is incompatible with manually downloaded files, so you should use Dataset.from_generator instead to parse those files.

1 Like

I think it should work no ? imagenet-1k doesn’t need manually downloaded files.

Maybe it fails to find the loading script in the cache, which is in a different cache directory dynamic_modules_cache ?