Do download files of dataset needed?

Hello everyone,
I downloaded few datasets and quickly ran out of disk space… :sweat_smile:
I was wondering whether I actually still need those download files for later usage of the dataset, or I can delete it’s content to free some space?

My datasets memory layout is as follows:
image
And I specifically asks about the ./downloads directory, which consist of files such as:

editing:
I’m mostly running the run_mlm.py script using those downloaded datasets.
In case I use the flag --dataset_cache_directory and specify the directory of the dataset in my files (which located under my .cache directory), do I still need the content of download directory?

Would appreciate any help on the topic :slight_smile:

Hi! Yes, it’s safe to delete the downloads files.

PS: We plan to align our caching layout with huggingface_hub/transformers to make it easier to find/delete a (specific) dataset’s “download” files.

1 Like