NER on multiple languages

goutham794 · July 20, 2020, 6:11pm

I want to do NER task on news articles that are in dozens of languages. Is the best option to go for xlm-roberta-large-finetuned-conll03-english ? I read that XLM models fine-tuned for a language work well in other languages as well. My main issue is that this model is too big. Should I go for language specific smaller models if I already know what language I’m dealing with?
Also I’m curious, why does the xlm-roberta-large-finetuned-conll03-german have so much more downloads than the English one?

stefan-it · August 6, 2020, 11:36am

Hi @goutham794,

you could train a multi-lingual NER model on the WikiANN dataset (or better: use the train/dev/test partitioned from GitHub - afshinrahimi/mmner: Massively Multilingual Transfer for NER).

But fine-tuning one big multi-lingual NER model could be very complicated (fine-tuning instabilities). And you should keep in mind, that WikiANN only has three label types.

If you already know what languages you want to cover, then a better way would be to train “mono-lingual” models + just search for NER datasets for your desired languages. Good resource is:

Topic		Replies	Views
Multilingual NER pretrained model fine tuning Models	0	325	December 9, 2023
Best model for multi-lingual NER 🤗Hub	0	863	January 19, 2022
Any Model for NER on French Models	7	1001	September 18, 2020
NER for multilingual in Tensorflow Beginners	3	42	April 6, 2025
Multilingual NER Extraction Models	1	540	July 13, 2022

NER on multiple languages

Related topics