How to configure dataset.description for a dataset without a loading script?

johann-petrak · October 11, 2023, 10:37am

As the documentation states, if the data is in some known format, no loading script is necessary and that works just fine.

However when loading the datasat the “description” feature is empty and I cannot figure out how to configure this (the metadata editor on the dataset card page does not contain a description field).

What is the recommended way to set this and possibly other features of the dataset?

mariosasko · October 11, 2023, 12:45pm

You can add this info to the dataset card (as a markdown). The datasets project started as a fork of Tensorflow Datasets when the Hub (for datasets) did not exist. Hence, most DatasetInfo attributes come from the fork (e.g., description, homepage, etc.) and are not integrated well with the Hub (or datasets), so we plan to deprecate this class eventually.

johann-petrak · October 11, 2023, 1:40pm

Thanks for this info, but I am still confused: there seem to be 3 different ways to provide information about a dataset: the features/attributes that can be set in the loader for the Dataset class, the metainformation which are in the YAML-part of the Readme file and the textual information in the markdown part.

The problem it is not clear what is supported in the YAML-part of the Readme, which of those make it into the attributes and thus are available programmatically.

So, in order to make the description available in the program from the dataset representation, can I do this without implementing a loader class? And if that functionality gets deprecated, obviously it would not be wise to implement it now, but how will then metainformation get available in the code?

In other words: what is the best and future-proof way to specify all metainformation in a way that makes it show up on hub AND available within python via the API, including the description?

mariosasko · October 12, 2023, 6:47pm

You can put this info in the Dataset Description part (after importing the template), then use huggingface_hub’s RepoCard API to download and parse the card.

Topic		Replies	Views
Dataset Description 🤗Datasets	0	69	July 11, 2024
Custom loading dataset script 🤗Datasets	4	511	January 3, 2023
Uploading image dataset to Huggingface Hub 🤗Datasets	2	2581	October 14, 2022
Contructing a dataset with categorical labels 🤗Datasets	2	595	July 18, 2023
Dataset loading script not working 🤗Datasets	2	431	August 31, 2023

How to configure dataset.description for a dataset without a loading script?

Related topics