Annif - toolkit for multilabel text classification

We are pleased to announce the release of Annif 1.1!

Annif is a multi-algorithm automated subject indexing tool intended for libraries, archives and museums. It suggest subjects or topics from a predefined vocabulary which can be a thesaurus, ontology or just a list of subjects. The number of the subjects in the vocabulary can be large, tens of thousands or even more, and thus the task Annif performs can be called extreme multilabel classification.

Annif uses more traditional machine learning techniques, not LLMs, which makes it very fast in inference: typically it gives subjects for a text corresponding to a PDF of tens of pages in less than one second. Annif has a CLI for administrative tasks and a REST API for end users. Its development started and continues at the National Library of Finland, but all are welcome to join in!

Regarding Hugging Face, Annif 1.1 introduced annif upload and annif download commands which can be used to push and pull a set of selected projects and vocabularies to and from a Hugging Face Hub repository.

Check out these resources:

PS Maybe someone could forward my above message to the HF posts feed? I’m still in the waitlist for it, so can’t post there myself.