Bert Job Title classifier gives strange results when trained on a larger dataset

CaseJnr · June 16, 2024, 9:22am

I have attempted to create a BERT Job Title Classifier based off some other examples. Colab can be viewed here:

https://colab.research.google.com/drive/1EXKrNIIaVLR9cnpysJCS-NnciksgMoZC?usp=sharing

When I train the model on a small 1k entry dataset such as

https://drive.google.com/uc?export=download&id=1Q29nC9Y1x6QGwSiKfu2ie45RG_p-BQ7k

I get expected results from my small test case.

programmer: Information & Communication Technology - Engineering - Software
builder: Manufacturing, Transport & Logistics - Machine Operators
mechanic: Trades & Services - Automotive Trades
office assistant: Administration & Office Support - Administrative Assistants
welder: Trades & Services - Welders & Boilermakers

However, If I train the model on a larger dataset containing 10K entries such as

https://drive.google.com/uc?export=download&id=1eFMpgiSmhsNoZCBPrwlJlPZ6Lp_Bqw2Y

My small test case gives incorrect results

programmer: Administration & Office Support - Administrative Assistants
builder: Administration & Office Support - Administrative Assistants
mechanic: Administration & Office Support - Administrative Assistants
office assistant: Administration & Office Support - Administrative Assistants
welder: Administration & Office Support - Administrative Assistants

So at this stage I am confused if the issue is in the code or the dataset.

I am currently training the model using the sub_classification_id. Should I instead train the model on the parent classification_id, then perform further analysis to determine the sub_classification_id?

If anyone could offer any guidance it would be highly appreciated.

Topic		Replies	Views
Bert Text classification Intermediate	7	560	November 24, 2023
Different results each time I run code Beginners	0	687	July 13, 2022
Text classifier is trained incorrectly using BERT transformers (f1 = 0) for a certain amount of dataset 🤗Transformers	2	828	August 31, 2023
BERT Multilabel - Different Training Dataset For Each Label? Intermediate	3	1305	December 27, 2021
Different BERT results Beginners	1	1175	May 25, 2022

Bert Job Title classifier gives strange results when trained on a larger dataset

Related topics