Text Classification With LLM

dss107 · August 8, 2024, 10:15am

Looking for a multi language model to categorize news articles by their titles that can be fine-tuned with labeled data. It shouldn’t rely on word-based methods. I have used SetFit, but the results aren’t great. It would be helpful if anyone can point me in the right direction.
I’m looking to finetune the model on news title

sahar-millis-markete · August 8, 2024, 11:41am

Using a simple prompt with Gemma / LLaMA will do the work just fine.
Wil a (large) few-shot example you will see amazing results.
You can use the free tier on GROQ or GEMINI.
And you won’t need to FT.

If this is an academic project, and you have limited resources, you can fine-tune an encoder to that.
Set-Fit is a great technique to do so, specifically when the number of examples for each label is small.

Flow for Training an Encoder:

Set up your dataset to be [titles<string>] [label<int>]
Divide the dataset into train-val-test.
Fine-tune a multilingual BERT/BART/roBERTa/…
Go to the beach.

Good luck.
Sahar

dss107 · August 12, 2024, 12:11pm

Thanks for the the guidance.
Basically I’m trying to train a model and set it up locally to process 1K plus news on daily basis. Setfit provided good prediction in the start but when new scenarios arise the accuracy falls. As for Bert i’m training one now. Hope the results are good

kbijari · August 15, 2024, 11:09am

Please be advised that a model like SetFit has the propensity to easily become overfitted to your training data (I mention this since you’ve indicated you are getting good initial prediction/results during training). It’s a bit tricky to start with SetFit and contrastive learning, as you might need to be careful with your parameters.

Additionally, if you have time, consider using SetFit to create a few hundred labeled data samples. Validate the data manually and then use them to train another vanilla classification model (e.g., RoBERTa).

codelion · January 13, 2025, 8:29pm

You can also try adaptive-classifier - GitHub - codelion/adaptive-classifier: A flexible, adaptive classification system for dynamic text classification which is an open-source flexible, adaptive classification system for dynamic text classification.

Topic		Replies	Views
Fine-tune model for domain or create language model from scratch Beginners	0	672	May 2, 2022
News topic classifier Intermediate	0	380	August 8, 2021
Total beginner on how to use a model exactly Beginners	0	440	July 25, 2023
Create a multilingual classifier 🤗 Course Projects	3	1530	October 22, 2024
Multilabel classification using LLMs Beginners	12	15005	June 7, 2024

Text Classification With LLM

Related topics