Custom Entity Extraction from text

philomath · May 28, 2022, 6:29pm

Hi! I’m trying to build a learning-based custom entity extraction model that is capable of extracting a specific value from a short piece of text. In other words, I have a dataset that consists of two columns: “description” and “store_number”, and I want my model to be able to extract the store_number from any description it is given. For instance:

descriptions:
[“FIVE GUYS 2565 DIST 468-981-3409 AL”,
“McDonald’s K6148”]

store_numbers:
[2565,
K6148]

and so on. I’m having a hard time figuring out what model or type of model I should train in order to accomplish this. Initially, I looked at Named Entity-Recognition and token classification, but that doesn’t seem to be the correct approach, since 1) BERT’s NER is limited to things like person, organization, etc, and 2) even if we could identify numbers with such an NER, we aren’t interested in finding the category/type of an entity, but rather correctly identifying the entity itself.

Any help on this would be appreciated–thanks!

HuggingFaceTamil · November 25, 2022, 6:05pm

Did you figure this out?

Ratansingh648 · November 5, 2023, 4:04am

You need to define a custom NER tagging class. Here is a general outline that you need to do:

Tag all the training data with store numbers with some tag. For example:

Text : “FIVE GUYS 2565 DIST 468-981-3409 AL”
Tags: O O B-STORE O O O O O

Here B-STORE represents the tag for store number and while O is ignored for rest.

Tokenize and create a dataset of input_ids, attention_mask and labels. Use some kind of encoder architecture like BERT, XLMRoberta etc. to compute the embedding of input and feed it to a fully connected network. You might find seq_eval library helpful in training here.
Run for few epochs and you would have your custom NER ready.

Topic		Replies	Views
Looking for ways to extract custom tokens from text Beginners	2	1996	May 28, 2022
NLP model for tag generation Beginners	3	2048	May 28, 2022
How to fine tune bert on entity recognition? Beginners	23	7364	November 21, 2022
Custom Entity Tagging Using BERT: How to Label Specific Terms? Beginners	0	353	October 14, 2023
Fine tuning NER BERT model on Phone numbers Beginners	3	1177	May 31, 2024

Custom Entity Extraction from text

Related topics