Mutli-label classification for large free text input

Hisham81 · December 2, 2024, 3:29pm

I have a dataset with 2 columns,

around 100k lines
large amount of free text (around 10k words per cell).
a list of labels that this free text belong to. usually between 5 to 30 list items.
the full unique list items (labels) are around 12k.

I was wondering what would be the best way to train a model on this dataset, as majority of transformers has a maximum of 512 or 1024 only tokens as input.

juhoinkinen · December 3, 2024, 8:02pm

Sounds like Annif could be usable for you. See also Annif-wiki and Annif-tutorial for instructions.

(FYI, I’m one of Annif developers.)

Hisham81 · December 10, 2024, 6:54am

I was looking for ready model but for best practice concept that need to be used in this case.

Thank you for your input.

Topic		Replies	Views
Most efficient multi-label classifier? Beginners	3	12043	September 1, 2022
Transformer for very big text 🤗Transformers	1	665	May 6, 2022
Multi-class Classification Basics Beginners	4	4576	August 24, 2021
Thoughts on quantity of training data for fine tuning Beginners	6	20299	March 10, 2022
Which transfomer for numeric dataset Beginners	0	284	June 4, 2023

Mutli-label classification for large free text input

Related topics