I am trying to fine tune bert-base-uncased model, but after loading datasets from pandas dataframe I get the following error with the trainer.train():
ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 5]))
I tried to understand the problem and I think it is related to the wrong data type. The following example illustrates this problem:
text = [‘John’, ‘snake’, ‘car’, ‘tree’, ‘cloud’, ‘clerk’, ‘bike’]
labels = [‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘0’, ‘2’]# create Pandas DataFrame
df = pd.DataFrame({‘text’: text, ‘label’: labels})# define data set object
ds = Dataset.from_pandas(df)
ds.features
The last command shows the following:
{‘text’: Value(dtype=‘string’, id=None),
‘label’: Value(dtype=‘string’, id=None)}
While it should be (from the huggingface tutorial)
{‘text’: Value(dtype=‘string’, id=None),
‘label’: ClassLabel(num_classes=5, names=[‘0’, ‘1’, ‘2’, ‘3’, ‘4’], names_file=None, id=None)}
My question is how to convert the ‘label’ that has a string type into a ‘label’ that has the proper ClassLabel type. Tutorials say that one should use the map function, but I could not find any code examples.
Thank you for your help.