I’m attempting to convert an Image fine-tuning notebook to multi-label classification (there’s a few more questions coming!). I haven’t touched Python since 2.4 so am rusty! The first place I’m stuck is with my labels.
My source dataframe can contain the indicies for the matched labels (e.g.
[3, 5]) or a list of zeros and ones for the categories
[0, 0, 0, 1, 0, 1]. Older posts on this forum have said I have to use one, or the other. Whichever, I understand that for Huggingface to work, I need to convert them to
Sequence(ClassList(names=classnames), ClassList(names=classnames), ClassList(names=classnames), ...)
First Question: how is the ClassList value set? On single-label classifications, this works great
ds = ds.cast_column("label", ClassLabel(num_classes=2, names=['accept', 'reject']))
but I don’t understand which position or named argument takes the column value. I’ve looked at the source code for ClassLabel and stil no clearer
Second Question: how do I massage my labels into the right format to pass in for training? I tried with this and multiple other forms but cannot get it to work. Or am I going in the wrong direction?
df.labels = df['labels'].apply(lambda x, cl=classlist: [ClassLabel(names=cl) for y in list(x.split(','))])