Multi-label classification: getting Sequence(ClassList()) for labels

pbowyer · March 23, 2022, 3:24pm

I’m attempting to convert an Image fine-tuning notebook to multi-label classification (there’s a few more questions coming!). I haven’t touched Python since 2.4 so am rusty! The first place I’m stuck is with my labels.

My source dataframe can contain the indicies for the matched labels (e.g. [3, 5]) or a list of zeros and ones for the categories [0, 0, 0, 1, 0, 1]. Older posts on this forum have said I have to use one, or the other. Whichever, I understand that for Huggingface to work, I need to convert them to Sequence(ClassList(names=classnames), ClassList(names=classnames), ClassList(names=classnames), ...)

First Question: how is the ClassList value set? On single-label classifications, this works great

ds = ds.cast_column("label", ClassLabel(num_classes=2, names=['accept', 'reject']))

but I don’t understand which position or named argument takes the column value. I’ve looked at the source code for ClassLabel and stil no clearer

Second Question: how do I massage my labels into the right format to pass in for training? I tried with this and multiple other forms but cannot get it to work. Or am I going in the wrong direction?

df.labels = df['labels'].apply(lambda x, cl=classlist: [ClassLabel(names=cl) for y in list(x.split(','))])

Topic		Replies	Views
Creating a Sequence of ClassLabel for multi-label and multi-class problems 🤗Datasets	5	740	March 26, 2024
Dataset label format for multi-label text classification 🤗Datasets	9	13324	February 9, 2023
Dataset for multilabel classification 🤗Transformers	1	184	January 20, 2025
Sequence features - Class Label Cast_ 🤗Datasets	9	1315	July 4, 2023
Could someone please explain how to make a multi-label dataset from csv? Beginners	2	3572	May 31, 2022

Multi-label classification: getting Sequence(ClassList()) for labels

Related topics