My dataset is eating too much space
I want to remove duplicates and obsolete datasets that I no longer use
how do i do that?
background
I checked the PC_user/.cache file and found that the dataset was downloaded redundantly.Inside hugginhface/datasets/downloads/extracted, something that looks like the main body of libirispeech is saved, but something else ( yalp_rebiew_full squad, etc.) could not be found Instead, there was a terrible amount of cache lined up.If possible, I would like to wipe out these as well.
In .cache/huggingface/datasets you can delete all the datasets that you no longer use (they are stored as Arrow files inside directories named after the datasets you used).
In .cache/huggingface/datasets/downloads you can also remove the raw data files that were downloaded to generate the Arrow datasets
First, in case you want to keep librispeech or any audio dataset, please locate the folder containing the audio files in downloads/exctracted. You might want to keep this one for librispeech.