How to rename values in a column from huggingface dataset

I have this dataset consisting of a train, val and test set. I wish to rename the label column from the existing positive, neutral and negative to 0, 1 and 2. This is easily done with pandas, but I can’t figure out to do this in the huggingface dataset framework. Help?

Take a look at the map() function from datasets.Dataset.map.

I guess something like this should achieve what you want:

def map_labels(sample):
    label = sample["label"]
    sample["label"] = 0 if label == "positive" else 1 if label == "neutral" else 2
    return sample

result = dataset.map(map_labels)
1 Like

Hi! The easiest/fastest way is to directly cast the label column to the ClassLabel type:

dset.cast_column("label", datasets.ClassLabel(names=["positive", "neutral", "negative"]))
2 Likes