My dataset is eating too much space
I want to remove duplicates and obsolete datasets that I no longer use
how do i do that?
I checked the PC_user/.cache file and found that the dataset was downloaded redundantly.Inside hugginhface/datasets/downloads/extracted, something that looks like the main body of libirispeech is saved, but something else ( yalp_rebiew_full squad, etc.) could not be found Instead, there was a terrible amount of cache lined up.If possible, I would like to wipe out these as well.
In .cache/huggingface/datasets you can delete all the datasets that you no longer use (they are stored as Arrow files inside directories named after the datasets you used).
In .cache/huggingface/datasets/downloads you can also remove the raw data files that were downloaded to generate the Arrow datasets
Should I do it manually? Will that cause any problems?
First, in case you want to keep librispeech or any audio dataset, please locate the folder containing the audio files in downloads/exctracted. You might want to keep this one for librispeech.
Other than that you can remove the rest