Cache for custom data loader

Hello Everyone,
I have implemented a custom loader for my data and I’ve realized that HF creates a copy of my loader file (.py) into my home .cache, and that’s the file that is executed and not the one that I have in my workspace. I just wanted to ask for the reason of this behavior and most importantly how I can change the location where HF copy this file. I don’t want it to be in my home. I tried with the cache_dir argument of the function load_dataset and also the cache_dir of the download_config, but neither of them works for this. Any help will be greatly appreciated.

Thanks,
Emilio

For your code to be imported properly we move it to the datasets modules cache. Dataset scripts from the Hub or from your local machine are copied there and imported from there in python to be executed. This way we don’t need to modify sys.path for every script, which can sometimes lead to unexpected behaviors.

You can change this directory using the HF_MODULES_CACHE environment variable.

1 Like