Calculate custom dataset size

Suppose I have a custom dataset and have converted it to a HF Datasets object. Is there a way to calculate the dataset size in GB from this object?

Hi! You can use the following formula to get the size in GB from a HF dataset: hf_dataset.data.nbytes / 1e9

2 Likes

Note that for vision and speech datasets:

hf_dataset.data may only contain the paths to local files. If you want to get the size in bytes of all the image/audio files, you might need to iterate over the image/audio files by yourself and check their sizes.

1 Like

Fantastic. Many thanks @mariosasko and @lhoestq!