I followed the documentation here to set my environment variable like this
import os
os.environ["HF_DATASETS_CACHE"] = os.path.join(os.getcwd(), "cache")
but when I load a custom dataset
from datasets import load_from_disk
my_dataset = load_from_disk("my_dataset")
train_dataset = my_dataset["train"]
the caching directory, from train_dataset.cache_files, seems to still be pointing to the directory of my_dataset, see code here
(then self._get_cache_file_path calls self.cache_files)
Is this an intended behavior? Is there any way I can cache all intermediate results to HF_DATASETS_CACHE?
also cache_dir is not supported in load_from_disk