ClassLabel disappear after loading DatasetDict

Hello,

I have a DatasetDict containing 10 splits (‘fold_0’ to ‘fold_9’). All the Dataset objects included in the DatasetDict contain 2 features: “label” & “text”. Here’s a small overview:

print(my_dataset_dict)
>>> DatasetDict({
        fold_0: Dataset({
            features: ['label', 'text'],
            num_rows: 85087
        })
        fold_1: Dataset({
            features: ['label', 'text'],
            num_rows: 85076
        })
    ....
        fold_9: Dataset({
            features: ['label', 'text'],
            num_rows: 85159
        })
    })

For each Dataset, the “label” column was encoded with ClassLabel, and the “text” column is just a bunch of sentences:

print(my_dataset_dict['fold_0'].features)
>>> {'label': ClassLabel(names=['MA211', 'MA221', ..., 'V39'], id=None), 'text': Value(dtype='string', id=None)}

So far so good, it’s exactly what I’m expecting.
However, if I push it to the Hub and then load it again (in another script or in the same one, it doesn’t matter), then the labels disappear.

huggingface_hub.delete_repo(repo_id=dataset_path, repo_type='dataset', missing_ok=True)  # Just to be sure the previous DatasetDict is removed first

my_dataset_dict.push_to_hub(dataset_path)  # No problem, I see it on the Hub after that (and the real labels appear)

test_dataset_dict = datasets.load_dataset(dataset_path)  # Reloading it from the same path

print(test_dataset_dict['fold_0'].features)
>>> {'label': Value(dtype='string', id=None),
     'text': Value(dtype='string', id=None)}

As you can see, I don’t have the labels anymore. It’s a problem for me because I need to “cure the data” and create the dataset in a specific notebook, and then load back the data and perform some ML tasks on another notebook, and I’m losing the real labels.
I tried loading using test_dataset_dict = datasets.load_dataset(dataset_path, download_mode=datasets.downloadMode.FORCE_REDOWNLOAD) but it doesn’t change anything. The text and the labels (just the integers) are loaded, but I don’t have the names of the labels. The names of the labels are pushed to the Hub, because I can see them on the viewer under the label column (I see the integer and the associated code right next to it):

Thanks for your help!

Any ideas? :up: