Multi-label token classification: "-100" special label

dried · July 25, 2023, 2:23pm

Hello!
I am implementing a HF-based model augmented with native Pytorch-code to classify tokens (not the document!) into one or more classes.

Now, based on single labels and using the AutoTokenizer and after aligning subwords with their labels, I get the following output, where the labels -100 correspond to the [CLS] and [SEP] special tokens as well as subwords starting with ## (not seen below). Apparently, the -100 labels tell the model to ignore the token when calculating loss.

{‘input_ids’: [2,
4759,
4683,
3],
‘token_type_ids’: [0,
0,
0,
0],
‘attention_mask’: [1,
1,
1,
1],
‘labels’: [-100,
3,
4,
-100]
}

However, with multi-label classification, my labels are lists of one-hot encodings as required for calculating nn.CEWithLogitsLoss():

‘labels’: [XXX,
[1, 0, 1],
[1, 1, 0],
[1, 0, 0],
XXX]]

What should I put here for XXX instead of the “-100” special label to tell the model to ignore special tokens as well as subword tokens?

Fedoration · September 18, 2023, 10:30am

Perhaps you can try to create a vector consisting of -100 values?
So instead of XXX you can try [-100 -100 -100]

Let me know if you get some results.

Topic		Replies	Views
Multi-label token classification 🤗Transformers	34	7703	September 6, 2023
Predicting with Token Classifier on data with no gold labels Beginners	1	1432	August 20, 2021
Will Trainer loss functions automatically ignore -100? 🤗Transformers	2	2145	June 29, 2023
Multilabel token classification (dataloader issues) 🤗Datasets	0	178	April 20, 2024
Word Specific Classification (custom token classification?) Beginners	0	76	May 28, 2024

Multi-label token classification: "-100" special label

Related topics