Suppose I have a custom dataset and have converted it to a HF Datasets object. Is there a way to calculate the dataset size in GB from this object?
Hi! You can use the following formula to get the size in GB from a HF dataset:
hf_dataset.data.nbytes / 1e9
Note that for vision and speech datasets:
hf_dataset.data may only contain the paths to local files. If you want to get the size in bytes of all the image/audio files, you might need to iterate over the image/audio files by yourself and check their sizes.