Sliding Window - Multilabel Classification

slyle · July 25, 2023, 4:40pm

I have multiple text fields I’d like to concatenate into 1 transformer input. However, doing so will push me over the token limit for some records (4096).

Are there any examples or documentation that cover how to handle the predictions before evaluation, in a multi label classification approach where the input text is chunked with a sliding window? I can’t seem to find any.

Because I want to incorporate the entire document and all possible prediction tags in the document from each chunk, I think I would need to somehow link the predictions from all chunks back to the same record before evaluation and backprop, but I see a few issues with this:

-The batch size would need to be large enough to incorporate all chunks from 1 record to do this or it wouldn’t be able to reaggregate labels.

-I am still “truncating” context. I could potentially misclassify on a per-chunk basis. I.e. incorrect prediction for a label in chunk 2 if there is some informing context from chunk 1 that doesn’t get picked up outside the overlap/stride. So simply taking the unique set of both chunk predictions could be incorrect because it doesn’t include this “out of stride” context. In other words, by not taking the mean of the logits - I am destroying the intra-chunk context I hope to incorporate. But if I take the mean, I may leave out potential labels that don’t appear in both chunks.

How can I combat this? A larger stride? Perhaps a different multifield approach than concatenating all useful fields together into one input? I was thinking maybe the new LongNet might be better suited for this.

Topic	Replies	Views
Sliding Window Approach for Multilabel Classification Beginners	562	July 21, 2023
For multi-class text classification, what's the maximum number of labels allowed? 🤗AutoTrain	1353	December 17, 2021
T5 multilabel classification using tf 🤗Transformers	509	March 28, 2023
How to train on those datasets that have multi-characters Beginners	212	July 12, 2022
Understanding multi-label classification training Beginners	827	February 14, 2023

Sliding Window - Multilabel Classification

Related topics