I have a csv file with two columns in which there are thousands of sentences (column 1, ‘sentence’) and they are marked as ‘type1’ and ‘type2’ (column 2, ‘label’). I need to build a classifier that learns to split incoming sentences into these two categories.
I tried to load the data and pass to:
from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2) df = pd.read_csv('filename.csv') ds = Dataset.from_pandas(df)
but it never works if I set the model’s
num_labels to anything other than 1. I get dimension errors. How do I specify in the dataset that there are 2 labels? (and maybe in general, how to specify that the label column is categorical, which N possible classes)
I’m really just trying to build a basic sentence classifier from my own labeled data…