Classification Problem - Which class of Hugging Face LLM models should I try?

SickPuppy96 · September 1, 2023, 2:21am

Hello

I was doing some research on how to improve my classification model (which currently uses text-embedding-ada-002), when I stumbled across the MTEB leaderboard.

For some additional context, I’m training a random forest machine learning model to solve a classification problem.

The input feature of the random forest model is a string for the equipments description: e.g. Equipment Description: “Area Name: PASTE PLANT, Equipment Group: CONVEYOR, Equipment Name: Conveyor Mixer”. And the output feature is criticality classification: e.g low, medium, high, severe. I am training the machine learning model on an existing dataset, and then using it to extrapolate for other assets.

Given there is no consistency of the input features (syntax and terminology changes), I’ve found by using a LLM to convert the equipment descriptions semantic meaning into vector embeddings, and then training the random forest machine learning model with those vector embeddings as features (after PCA dimensionality reduction) is working extremely well.

Question: Which class of Hugging Face LLM models should I try instead of OpenAI’s text-embedding-ada-002 model? I see there are tabs for Classification and Pair Classification. Any other’s I should try?

Question: What would you suggest I try for the task_objective? I see the examples given for task objective are Represent This Sentence, or Represent This Document for Retrieval.

Question: Could this problem be solved by fine-tuning a LLM and avoiding the need to use a machine learning model like random forest? I did try to do this with GPT-3.5 finetuning, which was recently released, however the results were very average and the random forest models performance was better.

I’m very experienced with using OpenAI’s range of models, however very new to Hugging Face and I’d love to learn more and explore what options Hugging Face has to offer.

Thank you kindly for any feedback.

Thanks in advance.

MattiLinnanvuori · September 3, 2023, 8:18am

You can use Hugging Face’s transformers library to classify texts as described in the following link. Text classification
You could use transformers like Sentence Transformers also to produce embeddings as described in the following link. Using Sentence Transformers at Hugging Face

MattiLinnanvuori · September 3, 2023, 8:28am

SetFit is another few-shot framework to classify texts: SetFit

Topic		Replies	Views
General question about text classification Models Beginners	4	271	November 21, 2024
Educational sentences classification Beginners	0	271	October 26, 2023
Total beginner on how to use a model exactly Beginners	0	437	July 25, 2023
Which hugging face llm is best for voice recognition 🤗Hub	4	4288	March 11, 2024
Seeking Advice on Named Entity Recognition with AI Beginners	6	651	February 5, 2025

Classification Problem - Which class of Hugging Face LLM models should I try?

Related topics