How would you train a model for hard/soft skill detection based on a taxonomy?

ddeisadze · March 22, 2024, 4:25am

Hi!

So I am trying to train a model to detect hard and soft skills from an ESCO taxonomy (see this example for what they give for a skill i.e. python https://esco.ec.europa.eu/sites/default/files/Python%20(computer%20programming).json)

My thinking is to train a spacy model using their matcher on the preferred and alternate labels and do fuzzy matching.

But I am not sure how I can take the description of a skill via an embedding and then perform NER on that. Any ideas?

For soft skills, esco has a similar classification they have the skill and the description and I would like to perform ner in a resume text.

Any ideas?? Would love some help on this

juhoinkinen · March 22, 2024, 4:32pm

You could check if Annif would suit your needs. We are developing it for use with different ontologies.

You would need training examples of the concepts in the ontology, but maybe you could use the descriptions or you have some already labeled data too…?

ddeisadze · March 23, 2024, 5:09pm

Thanks! I’ll check it out, do you know any good resources that explains training custom vocabulary in spacy? If i train spacy vocab on those terms and the descriptions, is it more likely to recognize the skills via vectors?

juhoinkinen · March 24, 2024, 8:15am

I don’t know about Spacy, and just to be clear, Annif is a separate tool, not part of Spacy. But we have a tutorial for Annif, which explains also how to set up a training corpus, maybe it helps: Annif-tutorial/exercises/OPT_custom_corpus.md at master · NatLibFi/Annif-tutorial · GitHub

Topic		Replies	Views
Pre-Trained Models with "Time" and "Distance" labels for NER Beginners	0	344	July 16, 2021
NER - aggregation_strategy Intermediate	1	1390	January 24, 2024
I need help with how to approach my project Beginners	0	226	January 24, 2024
Seeking Advice: Developing an Open-Source AI Model for Semantic Analysis and Grading of Textual Responses Beginners	0	453	November 14, 2023
Token Classification as Pre-training task Models	0	287	September 20, 2022

How would you train a model for hard/soft skill detection based on a taxonomy?

Related topics