Understanding data of dataset_infos.json

Hi everyone,

I was exploring dataset_infos.json , and I couldn’t figure out what some of the keys represent in the file. Could someone please point me to a reference, which I could use as column descriptions.

eg of some confusing columns: “download_size”, “dataset_size”, “size_in_bytes”, “post_processing_size” and “num_bytes”(splits).

Another set of keys I couldn’t understand/interpret what they represent, were “post_processed” and “supervised_keys”.

Is the structure documentation available, or is diving into the code from the command dataset-cli test would be the correct approach to figure this out?

Example from Cifar-10 (canonical):

1 Like

hey @dk-crazydiv you can find a description of all the DatasetInfo fields in the docs: Main classes — datasets 1.8.0 documentation

if something is unclear / could be improved, feel free to open a pr!

2 Likes

Thank you. This explains it.

1 Like