Text Classification without context

alexandergalen · February 20, 2025, 1:05am

I’m working on building a system to predict column headers for csv files based on the content of each column. The labels to find are First Name, Last Name, Company, Address 1, Address 2, City, State, and ZIP.

I’ve successfully fine-tuned a variant of BERT on my labeled training data, but still get poor predictions from it. I’ve also tried just using the python package usaddress which performs a similar task, which was also unreliable.

There’s two aspects of this task that I believe explain why these don’t perform well for me. First, the csv files I’ll be running predictions on won’t have a reliable column ordering, they can be mixed up. Second, these models rely heavily on context to make good predictions.

So, if I pass an entire row from the CSV for a prediction, if the ordering is messed up I will get bad predictions due to the disordered context confusing the model. The other option I found was making predictions on each individual field, by asking it to predict a label for just strings like “1234 Main St” or “San Francisco” without including any other text from the row, but the models still don’t perform well on this, as they just have no context now, instead of the incorrectly ordered context.

It seems that these types of models are overly complex for my task, and that the attention to context that normally helps these models make better predictions is actually what’s hurting my predictions here since I’m either depriving the model of context or giving it disordered context.

I’m just looking for some insight into how to approach this kind of task, and/or if there are better choices of models to use for this type of task besides transformers since those all rely on context. I am still pretty new to machine learning, so apologies for any inaccuracies or if I didn’t explain this clearly enough.

alexandergalen · February 20, 2025, 5:54pm

A rule based approach works very well for identifying Zip codes and States, so I won’t use machine learning for those two. It also works okay for address 1 and address 2, but is not reliable for identifying First Name, Last Name, Company, or City which is what lead me to learn how to approach this with machine learning.

I’m interested in character-level models, but I’m not sure how to find these, there don’t seem to be any on the hugging face hub unless I’m just missing them.

John6666 · February 21, 2025, 10:24am

Are these the models for character levels?

Topic		Replies	Views
Three class prediction but only two output Beginners	0	314	January 10, 2022
Looking for tool class to do predictions 🤗Transformers	3	551	October 9, 2020
Sequence classification Research	0	398	December 11, 2022
LayoutLMv3 for tokenClassification-within-a-table/Table Extraction Beginners	0	758	November 6, 2023
Text Classification Without using Auto Model For Sequence Classification Models	0	32	August 7, 2024

Text Classification without context

Related topics