Multi-label token classification

drussellmrichie · May 17, 2022, 7:34pm

Hi @murdockthedude. I’m using some sensitive (biomedical) data and my use case is actually a little more complicated than ‘just’ multi-label NER, so I’d have to make up some dummy data and simplify my notebook a bit. Let me think more about that and how to make a shareable notebook.

In the mean time, the answer to your first two questions.

The customer trainer is all that’s needed, although you probably also want to implement a special compute_metrics function and use that when you instantiate the trainer so you can do early stopping.
In my case I used BertForTokenClassification but using AutoModelForTokenClassification should be fine, I think.

murdockthedude · May 17, 2022, 9:01pm

Hey @drussellmrichie, totally understand, thank you. I’ll try to get a small notebook working too to see if I can tape this all together.

One question I have: Assuming I implement the custom trainer approach above, at inference time for multi label token classification, do you just take the individual output logits and run them through a sigmoid activation to get your final per-label-per-token probabilistic values (as opposed to single label which runs them through a softmax)? Or is it something more complex than that?

Thanks again!

drussellmrichie · May 17, 2022, 10:29pm

@murdockthedude . I don’t think you even need to bother with sigmoid – you can just pass the logits through the sign function as in lambda x: 1 if x > 0 else 0. There are efficient TF and pytorch functions for that.

murdockthedude · May 20, 2022, 10:56pm

sorry for the slow reply, this is great. I’m working on getting things functioning but this approach seems promising thus far. Will update back…

derludo · January 11, 2023, 10:50pm

Hey, sorry for opening this old thread again, this looks pretty much exactly like what I am looking for at the moment. But how did you actually succeed in passing the one-hot encoded labels through to the Trainer?

Whenever I try this I am getting errors thrown by DataCollatorForTokenClassification that it expects the labels to be integers.

saishashank85 · May 9, 2023, 5:37am

Can any one pls post your compute metrics function.

Help much appreciated.

baxi099 · May 25, 2023, 11:03am

@BunnyNoBugs @murdockthedude
I am working on morphological analysis problem where each token has multiple labels. Can you share sample notebook / working example so that I can understand and experiment with my problem ?

baxi099 · May 25, 2023, 11:10am

Can you share your code / sample working example?

drussellmrichie · May 25, 2023, 1:24pm

Unfortunately I can’t. The code is in my old workplace’s private laptop and git repo.

However, we published a paper based on this work here: https://academic.oup.com/jamia/advance-article-abstract/doi/10.1093/jamia/ocad046/7099518

You might try emailing some of my coauthors, like Victor Ruiz. He might be willing to help you.

derludo · May 25, 2023, 2:32pm

You can take a look here. The code is a bit messy since it was done under time pressure in the end, but it gets the multi-token classification done, and we achieved good results with it.

baxi099 · May 31, 2023, 5:49am

Is the below data format correct for the above piece of code ?

Dataset({
features: [‘word’, ‘pos’, ‘noun_case’, ‘noun_gender’, ‘noun_number’],
num_rows: 22
})

Here, ‘pos’, ‘noun_case’, ‘noun_gender’, ‘noun_number’ are binary labels.

dried · July 20, 2023, 3:33pm

Hey everyone!

I also try to implement a BERT-based, two-headed model with one multi-label-classification head and a multi-class classification head. The two heads are not directly related to each other, they predict two different aspects of a token.

The challenge is now to combine these two heads into one model. So far, I made the following observations:

Use BCEWithLogitsLoss for multi-label classification (labelled with a vector of hot-encoded labels with 1 or 0)
Use CrossEntropyLoss for multi-class classification (labelled with an ID corresponding to the label)
Compute and return mean of both loss function results

My questions now:
Is it possible to implement two model heads based on the Trainer API? Or do I need to implement a Model class manually?
What are these “class_weights” mentioned in the solution of this post?
Is my assumption correct that I can just compute the mean of the two losses?
Is there any reason for NOT implementing this approach, but rather implementing two separate models?

Best regards,
Daniel

muibk · September 6, 2023, 7:23pm

How is your data encoded? I assume it’s not one-hot encoded, since you explicitly ignore the -100 label ID.

drussellmrichie · September 6, 2023, 9:15pm

I think our paper has these details. I’d recommend checking that out. Should be open access:

https://academic.oup.com/jamia/advance-article-abstract/doi/10.1093/jamia/ocad046/7099518?redirectedFrom=fulltext&login=false#no-access-message

muibk · September 6, 2023, 9:17pm

Thanks for replying, I’ll check the paper.

Topic		Replies	Views
Mullti Label Text Classification 🤗Transformers	2	1569	June 26, 2023
Multi label classification with large number of labels and sparse data 🤗Transformers	1	1522	July 15, 2023
Custom BCEWithLogitsLoss for Sequence Classification using Auto Model Beginners	1	19	May 26, 2025
Multi-label token classification: "-100" special label 🤗Transformers	1	504	September 18, 2023
Multilabel token classification (dataloader issues) 🤗Datasets	0	178	April 20, 2024

Multi-label token classification

Related topics