I have this dataset consisting of a train, val and test set. I wish to rename the label column from the existing positive, neutral and negative to 0, 1 and 2. This is easily done with pandas, but I can’t figure out to do this in the huggingface dataset framework. Help?
Take a look at the map() function from datasets.Dataset.map.
I guess something like this should achieve what you want:
def map_labels(sample):
label = sample["label"]
sample["label"] = 0 if label == "positive" else 1 if label == "neutral" else 2
return sample
result = dataset.map(map_labels)
1 Like
Hi! The easiest/fastest way is to directly cast the label column to the ClassLabel
type:
dset.cast_column("label", datasets.ClassLabel(names=["positive", "neutral", "negative"]))
1 Like