Question on language modeling preprocessing

aswinsson · January 20, 2021, 5:54pm

I am trying running the language modeling script run_mlm.py script but I am facing some storage issues when running the preprocessing of the input text data. The main issue here is that the preprocess data by default gets saved in the .cache/huggingface/datasets folder. But my .cache folder is pretty small. Is it possible to redirect the preprocessing of the input text data to a different folder?

Thanks a lot for your help.

sgugger · January 21, 2021, 3:47pm

You can set an environment variable to control where the cache goes and change that default. For all HF libraries, the variable is "HF_HOME".

aswinsson · January 21, 2021, 10:03pm

Thanks for the quick reply. It works like a charm.

Topic		Replies	Views
How do I customize .cache/huggingface Beginners	2	2839	November 1, 2022
Change cache directory Beginners	1	2921	November 1, 2022
Cache for custom data loader Intermediate	1	588	September 23, 2022
Change model download folder? Beginners	1	9552	October 17, 2023
How do I change the cache default folder for "hub"? Beginners	6	6354	October 10, 2024

Question on language modeling preprocessing

Related topics