Hello community,
I am working on a project that requires extraction of a specific value from a text. Here is an example:
“This job offers a salary of $60000 and additional benefits like equity, health insurance and a private apartment”.
I want to be able to train a model that is able to recognize that $60000 is the salary of the job, but also be able to get the additional information that is related to the benefits like the equity and health insurance.
I have already solved this with a large corpus of regular expressions and manual text extraction, but as you are aware, there is always this one example that breaks the system. Therefore, I am hoping that I can use something to train my model model to recognize these “tokens”.
So in my internal language, the “$60000” is a token of the type “salary_value” and “equity”, “health insurance” and “private apartment” are tokens of the type “benefits”. There are a couple of other token types, but for the example let’s stay with this.
I have a lot of training data where these are annotated, so the text area that hast the token and what token is expected.
Can I use any of the hugging face libraries to build something similar? I have looked at the existing models, but they focus a lot on NER like “location”, “name”, “company”, etc.
I guess a good summary is that I am looking for some guidance on what to use best here.
Thanks!
Alex