How to convert string labels into ClassLabel classes for custom set in pandas

I am trying to fine tune bert-base-uncased model, but after loading datasets from pandas dataframe I get the following error with the trainer.train():
ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 5]))

I tried to understand the problem and I think it is related to the wrong data type. The following example illustrates this problem:
text = [‘John’, ‘snake’, ‘car’, ‘tree’, ‘cloud’, ‘clerk’, ‘bike’]
labels = [‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘0’, ‘2’]# create Pandas DataFrame
df = pd.DataFrame({‘text’: text, ‘label’: labels})# define data set object
ds = Dataset.from_pandas(df)

The last command shows the following:
{‘text’: Value(dtype=‘string’, id=None),
‘label’: Value(dtype=‘string’, id=None)}

While it should be (from the huggingface tutorial)
{‘text’: Value(dtype=‘string’, id=None),
‘label’: ClassLabel(num_classes=5, names=[‘0’, ‘1’, ‘2’, ‘3’, ‘4’], names_file=None, id=None)}

My question is how to convert the ‘label’ that has a string type into a ‘label’ that has the proper ClassLabel type. Tutorials say that one should use the map function, but I could not find any code examples.

Thank you for your help.

hi @Krzysztof,

i think you can get what you want by using the features argument of Dataset.from_pandas:

from datasets import Dataset, Value, ClassLabel, Features

text = ["John", "snake", "car", "tree", "cloud", "clerk", "bike"]
labels = [0,1,2,3,4,0,2]
df = pd.DataFrame({"text": text, "label": labels})# define data set object
features = Features({"text": Value("string"), "label": ClassLabel(num_classes=5, names=[0,1,2,3,4])})
ds = Dataset.from_pandas(df, features=features)
# {'text': Value(dtype='string', id=None),
#  'label': ClassLabel(num_classes=5, names=[0, 1, 2, 3, 4], names_file=None, id=None)}

Thank you, it solves my problem