All of the dataset
examples appear to hard-code the list of labels, i.e. ag_news
def _info(self):
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=datasets.Features(
{
"text": datasets.Value("string"),
"label": datasets.features.ClassLabel(names=["World", "Sports", "Business", "Sci/Tech"]),
}
),
homepage="http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html",
citation=_CITATION,
task_templates=[TextClassification(text_column="text", label_column="label")],
)
I’d like to load my labels from a file (i.e. either use the names_file
argument of ClassLabel or load directly read a json file and construct the names
argument.
The issue I’m having is the _info(self)
method doesn’t give me access to a download_manager
so I cannot get the path to names_file in the cache.
I don’t want to hard code my labels, I have different variants with different labels and I want to include a metadata file that includes the labels per variant along with additional identifiers.
Note I’ve also posted to SO python - Creating a Huggingface Dataset with categorical class labels from a file - Stack Overflow