Hi !
I have a really simple script:
import datasets
import os
dset = datasets.load_dataset("imagenet-1k", cache_dir=os.environ["DATASET_STORE"])
which should use a cached version of the Imagenet1k dataset at $DATASET_STORE, which exists:
And as the screen shows, it always tries to download the dataset online rather than using the cache.
Am I doing it wrong ? Thanks !
The datasets
’ cache structure is incompatible with manually downloaded files, so you should use Dataset.from_generator
instead to parse those files.
1 Like
lhoestq
3
I think it should work no ? imagenet-1k
doesn’t need manually downloaded files.
Maybe it fails to find the loading script in the cache, which is in a different cache directory dynamic_modules_cache
?