How to get size of a dataset?

How can I get the download size and the total size (unzipped) of any dataset on hugginface?
It would be nice if this would be displayed in the UI.

I found this guide Get the number of rows and the size in bytes, but it does not work when the dataset is not compatible with the dataset viewer. For example, for the SEED-Bench (AILab-CVC/SEED-Bench Ā· Datasets at Hugging Face) it is not possible.

There are two ways to know the size of a dataset:

  • the /size endpoint, as you mentioned: limited to the datasets compatible with the dataset viewer
  • the metadata filled in the README YAML frontmatter

If any of these are available, the size is shown in the right column:

Re:

but it does not work when the dataset is not compatible with the dataset viewer

Most of the datasets should be compatible. A big part of the incompatible datasets are empty repositories.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.