Could someone please explain how to make a multi-label dataset from csv?

yulgm · May 30, 2022, 4:42pm

I have a csv file with two columns in which there are thousands of sentences (column 1, ‘sentence’) and they are marked as ‘type1’ and ‘type2’ (column 2, ‘label’). I need to build a classifier that learns to split incoming sentences into these two categories.

I tried to load the data and pass to:

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

df = pd.read_csv('filename.csv')
ds = Dataset.from_pandas(df)

but it never works if I set the model’s num_labels to anything other than 1. I get dimension errors. How do I specify in the dataset that there are 2 labels? (and maybe in general, how to specify that the label column is categorical, which N possible classes)

I’m really just trying to build a basic sentence classifier from my own labeled data…

nbroad · May 31, 2022, 1:35pm

Here’s an example: Transformers-Tutorials/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub

yulgm · May 31, 2022, 8:57pm

Thank you!

Topic		Replies	Views
Dataset label format for multi-label text classification 🤗Datasets	9	13271	February 9, 2023
Dataset for multilabel classification 🤗Transformers	1	163	January 20, 2025
BERT for Dataset with two label columns Beginners	1	467	January 22, 2024
BERT Multilabel - Different Training Dataset For Each Label? Intermediate	3	1305	December 27, 2021
Creating a Sequence of ClassLabel for multi-label and multi-class problems 🤗Datasets	5	726	March 26, 2024

Could someone please explain how to make a multi-label dataset from csv?

Related topics