How would you train a model for hard/soft skill detection based on a taxonomy?


So I am trying to train a model to detect hard and soft skills from an ESCO taxonomy (see this example for what they give for a skill i.e. python

My thinking is to train a spacy model using their matcher on the preferred and alternate labels and do fuzzy matching.

But I am not sure how I can take the description of a skill via an embedding and then perform NER on that. Any ideas?

For soft skills, esco has a similar classification they have the skill and the description and I would like to perform ner in a resume text.

Any ideas?? Would love some help on this

You could check if Annif would suit your needs. We are developing it for use with different ontologies.

You would need training examples of the concepts in the ontology, but maybe you could use the descriptions or you have some already labeled data too…?

Thanks! I’ll check it out, do you know any good resources that explains training custom vocabulary in spacy? If i train spacy vocab on those terms and the descriptions, is it more likely to recognize the skills via vectors?

I don’t know about Spacy, and just to be clear, Annif is a separate tool, not part of Spacy. But we have a tutorial for Annif, which explains also how to set up a training corpus, maybe it helps: Annif-tutorial/exercises/ at master · NatLibFi/Annif-tutorial · GitHub