Create a NER tagger for African languages

lewtun · November 9, 2021, 7:28pm

Please read the topic category description to understand what this is all about

Description

Africa has over 2,000 spoken languages, but these languages are massively underrepresented in NLP research and datasets. The goal of this project is to train strong models for the MasakhaNER corpus, which is a high quality dataset for named entity recognition in 10 African languages.

Model(s)

There are a few popular multilingual models that you can start with:

Datasets

masakhaner

Challenges

It is unlikely that all ten languages in MasakhaNER are represented in multiingual models like XLM-R or mBERT, so some decisions will be need to be made on which subsets to focus on.

Desired project outcomes

Create a Streamlit or Gradio app on Spaces that can take text from one or more of the languages in MasakhaNER and extract the person names (PER), organizations (ORG), locations (LOC) and dates & time (DATE) tags.
Don’t forget to push all your models and datasets to the Hub so others can build on them!

Additional resources

MasakhaNER: Named Entity Recognition for African Languages

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

Follow the instructions on the #join-course channel
Join the african-ner channel

Just make sure you comment here to indicate that you’ll be contributing to this project

Team organization on the Hub

To join this team, make sure you join the following organisation on the Hub:

team-african-ner (🤗 Course Team African NER)

seanbenhur · November 13, 2021, 7:13am

I am interested in this project, don’t know African languages, but would be delighted to create something useful for the community, Count me in!

lewtun · November 13, 2021, 7:35pm

Great to hear that you’re interested in this project @seanbenhur! I’ve created a Discord channel (info in the topic description) in case you and others want to use it to coordinate

lewtun · November 14, 2021, 4:58pm

Hey @seanbenhur, I’ve created an organisation on the Hub for your team so that you can push your models there and deploy your Streamlit / Gradio application

See the topic description for the link (I’ve already send you an invite)

seanbenhur · November 15, 2021, 4:30am

Thank you!

Topic		Replies	Views
MobileBert and Ner Beginners	0	238	May 8, 2023
NER on multiple languages 🤗Transformers	1	2842	August 6, 2020
Multilingual NER pretrained model fine tuning Models	0	325	December 9, 2023
Build a language detector 🤗 Course Projects	12	2341	January 26, 2022
Create a multilingual classifier 🤗 Course Projects	3	1509	October 22, 2024