My usage of Hub datasets is 595 GB even though I used approximately 4 GB with datasets

Hi HuggingFace team!
Thanks for building great platform. I’m having an issue with new limits for storage. In my dashboard I have 595 GB used, but my actual usage of datasets is approx 4 GB. Could you please take a look at this?

2 Likes

The issue you’re describing could be due to discrepancies in how storage is calculated on your dashboard versus the actual size of your datasets. This often happens when:

  1. Versions of Datasets: Old versions of your datasets might still be stored and counted toward your usage quota.
  2. Artifacts or Files: There might be cached files, logs, or other artifacts associated with your work that are contributing to the storage usage.
  3. Snapshots or Backups: Automatic snapshots of datasets or models might be retained, which could inflate the reported usage.

To resolve this issue, follow these steps:

1. Check Dataset Versions

  • Look into the dataset versions in your repository. Clean up older versions that are no longer needed.
  • In Hugging Face, navigate to the Datasets section of your repository, and verify if older datasets are present.

2. Clear Cache

  • If you’re using the Hugging Face library, cached models and datasets could contribute to the usage. Run the following command to clear the cache:

bash

Copy code

huggingface-cli delete-cache
  • Alternatively, locate the cache directory (default is ~/.cache/huggingface) and remove unnecessary files.

3. Inspect Models

  • Check the size and number of any fine-tuned models or saved checkpoints in your account.
  • Clean up outdated or unused models.

4. Contact Support

  • If the storage discrepancy persists, it’s best to contact the Hugging Face support team for clarification. Provide them with:
    • Your account details.
    • The size of the datasets you’ve uploaded.
    • Screenshots or details about the storage displayed on your dashboard.
1 Like