Hi!
Congrats on this amazing library.
I am uploading a dataset programmatically using push_to_hub
and defining the features as follows:
# ds contains text and label strings
hf_ds = Dataset.from_dict(
ds,
features=Features({
"text": Value("string"),
"label": ClassLabel(names=['World', 'Sports', ..])
})
)
hf_ds.push_to_hub("Recognai/corrected_labels_ag_news")
The thing is that even if I see the ClassLabel feature when I do hf_ds.features
. The result on the dataset preview shows the labels as int
and seems to indicate they’ve been given the int
type.
Is there something I’m doing wrong on my side?
For reference this is the dataset: Recognai/corrected_labels_ag_news · Datasets at Hugging Face
Sorry I’ve seen this has already been tackled here:
huggingface:master
← huggingface:push-dataset_infos.json-to-hub
opened 02:07PM - 21 Dec 21 UTC
When doing `push_to_hub`, the feature types are lost (see issue https://github.c… om/huggingface/datasets/issues/3394).
This PR fixes this by also pushing a `dataset_infos.json` file to the Hub, that stores the feature types.
Other minor changes:
- renamed the `___` separator to `--`, since `--` is now disallowed in a name in the back-end.
I tested this feature with datasets like conll2003 that has feature types like `ClassLabel` that were previously lost.
Close https://github.com/huggingface/datasets/issues/3394
I would like to include this in today's release (though not mandatory), so feel free to comment/suggest changes
1 Like