Multilabel classification using LLMs

I needed to know what’s the best way to finetune LLM models for multiclass classification tasks where there are more than 100 classes. I assume that ‘Text Generation’ is the main functionality of these LLMs and most of the coding examples and documentations show the ‘Text Generation’ as the example only.

I know that I can generate those labels by finetuning these ‘Text Generation’ models on my dataset, but this will only train the LLMs on the labels that are present in the train dataset and still there are labels which are missing from my train dataset. The ideal scenario would be to have datasets for all the classes in my train dataset but it’s not the case as of now. Also, there might be instances where these text generation models would generate new classes of their own which doesn’t exist at all.

I have seen on HF that there are model classes of these LLMs like ‘LlamaforSequenceClassification’ with sequence classification heads but haven’t found any example of their implementation.

I know that BERT models are used for the Sequence Classification tasks but can I do it using any of the LLMs? If possible, please share any example for the same.

Below I attach an example of multilabel classification with an LLM model.

Thank you for your reply. I wanted to know if I can use the LLMs like Llama2, Mistal, Phi2 (Decoder only models) for text classification just like you did here with BERT (which is an encoder model).

In the Llama2 documentation on HF, they have mentioned a class -

But I am not sure how to utilize this class for sequence classification purpose.

I have used Mistral for NER which is a word level classification task. I used RAG to retrieve similar queries (and their labels) and show the decoder model what I expect the output to be. You can then pass the query and it should classify it according to the examples you have shown it.

I will say while this works…it is certainly more native to do this with and encoder like BERT.

If non-LLM approaches suit you, you could consider Annif, see the demo at

Annif is developed for extreme multilabel classification of texts, so it is most suitable if there are thousands or tens of thousands labels to choose from.

but this will only train the LLMs on the labels that are present in the train dataset and still there are labels which are missing from my train dataset.

Annif utilizes two kinds of algorithms in its backends: associative and lexical. An associative algorithm suggests only the labels it has seen on its training set, whereas a lexical algorithm can suggest any label from the vocabulary, because it learns features on based on word positions etc. See Backend: MLLM · NatLibFi/Annif Wiki · GitHub for more details.

(That said, Annif could in future use some LLM or BERT for well-performing zero-shot analysis, which is why I’m interested in these topics.)

You can utilise a decoder-only model like LlamaForSequenceClassification like in the link above. If HuggingFace has created a sequence classification class, you can use it like above.

Thank you, I will try Annif and see if this helps for my use case.

I guess that using RAG for text classification would not generate most accurate predictions as compared to fine tuning the LLM on my dataset.

I will figure this out and will post here if it works or not.