Assigning Product Categories in a Large Catalog

IvanSanych · October 23, 2023, 5:30am

I am on a quest to effectively categorize a set of 10,000 new products into existing categories, using a machine learning approach. I have a dataset for training and several strategy options on hand, ranging from classic ML techniques, to transformer-based embeddings, and utilizing pre-trained language models. But, I’m unsure about the optimal route to take. Can you help me decide?

I have a dataset of 1000 products, each assigned to one of roughly 100 categories. The data for each product includes its name, description, and price. I now wish to categorize an additional 10K products with similar data using a robust and reliable method. Each product should be assigned to a single category.

In a traditional machine learning approach, I might use libraries such as Spacy or NLTK to represent each product as a bag of words, train a classifier on this representation, then apply the classifier to the new catalog.

Alternatively, within the HuggingFace ecosystem, I could use a transformer to represent each product as a vector, which can subsequently be used with traditional machine learning methods. Or, I could directly apply a pre-trained model from HuggingFace.

Considering the above, I have the following questions:

Which of the three mentioned approaches would you recommend trying first: classic ML, transformer-generated embeddings with classical ML, or a pre-trained Language Model?
If you recommend the third approach (i.e., using a pre-trained model), can you suggest specific HuggingFace models suitable for this task?

Any insights or recommendations would be greatly appreciated.

Thank you!

Topic		Replies	Views
Products text classification Beginners	0	1128	February 21, 2023
How can I do word classification? Beginners	3	1446	July 26, 2021
Total beginner on how to use a model exactly Beginners	0	437	July 25, 2023
Doing classification 100% from scratch? 🤗Transformers	4	1712	September 17, 2021
Match Product Names Beginners	3	498	April 9, 2025

Assigning Product Categories in a Large Catalog

Related topics