I needed to know what’s the best way to finetune LLM models for multiclass classification tasks where there are more than 100 classes. I assume that ‘Text Generation’ is the main functionality of these LLMs and most of the coding examples and documentations show the ‘Text Generation’ as the example only.
I know that I can generate those labels by finetuning these ‘Text Generation’ models on my dataset, but this will only train the LLMs on the labels that are present in the train dataset and still there are labels which are missing from my train dataset. The ideal scenario would be to have datasets for all the classes in my train dataset but it’s not the case as of now. Also, there might be instances where these text generation models would generate new classes of their own which doesn’t exist at all.
I have seen on HF that there are model classes of these LLMs like ‘LlamaforSequenceClassification’ with sequence classification heads but haven’t found any example of their implementation.
I know that BERT models are used for the Sequence Classification tasks but can I do it using any of the LLMs? If possible, please share any example for the same.
Thank you for your reply. I wanted to know if I can use the LLMs like Llama2, Mistal, Phi2 (Decoder only models) for text classification just like you did here with BERT (which is an encoder model).
I have used Mistral for NER which is a word level classification task. I used RAG to retrieve similar queries (and their labels) and show the decoder model what I expect the output to be. You can then pass the query and it should classify it according to the examples you have shown it.
I will say while this works…it is certainly more native to do this with and encoder like BERT.
If non-LLM approaches suit you, you could consider Annif, see the demo at annif.org.
Annif is developed for extreme multilabel classification of texts, so it is most suitable if there are thousands or tens of thousands labels to choose from.
but this will only train the LLMs on the labels that are present in the train dataset and still there are labels which are missing from my train dataset.
Annif utilizes two kinds of algorithms in its backends: associative and lexical. An associative algorithm suggests only the labels it has seen on its training set, whereas a lexical algorithm can suggest any label from the vocabulary, because it learns features on based on word positions etc. See Backend: MLLM · NatLibFi/Annif Wiki · GitHub for more details.
(That said, Annif could in future use some LLM or BERT for well-performing zero-shot analysis, which is why I’m interested in these topics.)
You can utilise a decoder-only model like LlamaForSequenceClassification like in the link above. If HuggingFace has created a sequence classification class, you can use it like above.
I tried using the Llama for Sequence Classification head, but it turns out that the predictions were not upto the mark and worse than the BERT classification model predictions.
It’s better to use the AutoModelForCausalLM class for finetuning the decoder only LLMs as they generate better predictions than BERT and LLM model with sequence classification heads.
Hi, I am now having very similar task. I am not familiar with AutoModelForCausalLM but I hope you succeed in what you are doing. If it’s okay, please let me know if it works out.
I am trying to use RAG+ GPT approach. Whether it works out or not, I’ll post the comment here.
This notebook is available in the Hugging Face documentation of Llama2 model ( Llama2 (huggingface.co)), and I think this is the most appropriate way for text classification if you want to use Decoder only models for text classification tasks.
RAG and AutoModelForSequenceClassification head on top of these decoder only LLMs, won’t give the best results for text classification.