To delete the cache for a specific model or dataset, you can also locate and delete it directly using your OS’s file manager. However, it can be hard to find…![]()
Personally, I recommend using the HF CLI as it’s the most reliable method.
No. You do not need to reload the dataset to delete its cache.
How to delete without loading:
- Hub cache (the 60 GB “scan-cache” sees)
Delete the repo from the Hub cache directly. No Python, no dataset load.
# preview
hf cache ls --filter "repo_id==dataset/MLCommons/ml_spoken_words"
hf cache rm dataset/MLCommons/ml_spoken_words --dry-run
# delete
hf cache rm dataset/MLCommons/ml_spoken_words -y
# if your cache lives elsewhere
hf cache rm dataset/MLCommons/ml_spoken_words -y --cache-dir /path/to/hf/hub
This is the supported way to surgically remove a dataset repo from the Hub cache. (Hugging Face)
- Datasets processed cache (the ~153 GB under
~/.cache/huggingface/datasets)
You can remove those Arrow/processed files by path. No need to construct aDatasetin Python.
# find the directories for this dataset
find ~/.cache/huggingface/datasets -maxdepth 3 -type d -iname '*ml_spoken_words*' -print
# common space hogs you can delete safely
rm -rf ~/.cache/huggingface/datasets/downloads # raw archives
rm -rf ~/.cache/huggingface/datasets/downloads/extracted # extracted archives
# remove only this dataset's processed shards (after confirming paths via `find`)
rm -rf ~/.cache/huggingface/datasets/*ml_spoken_words*
Hugging Face’s Datasets docs and forum confirm: processed caches live under ~/.cache/huggingface/datasets, and it is safe to delete downloads/ and dataset-specific folders when you want to reclaim space. (Hugging Face)
Notes and alternatives:
- The
cleanup_cache_files()API does require aDatasetobject, which implies loading, so skip it if the load is slow and just delete by path as above. The method exists, but it is optional. (Hugging Face) - If you prefer a lighter Python route, you can still avoid a full prepare by cleaning the Hub cache with the CLI, then delete the Datasets cache directories by path; this matches the official cache model split: Hub cache at
~/.cache/huggingface/hub, Datasets cache at~/.cache/huggingface/datasets. (Hugging Face)
Summary: use hf cache rm ... for the Hub cache, and delete the dataset’s folders under ~/.cache/huggingface/datasets for processed data. No dataset reload required. (Hugging Face)