Create a NER tagger for African languages

:wave: Please read the topic category description to understand what this is all about

Description

Africa has over 2,000 spoken languages, but these languages are massively underrepresented in NLP research and datasets. The goal of this project is to train strong models for the MasakhaNER corpus, which is a high quality dataset for named entity recognition in 10 African languages.

Model(s)

There are a few popular multilingual models that you can start with:

Datasets

Challenges

It is unlikely that all ten languages in MasakhaNER are represented in multiingual models like XLM-R or mBERT, so some decisions will be need to be made on which subsets to focus on.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that can take text from one or more of the languages in MasakhaNER and extract the person names (PER), organizations (ORG), locations (LOC) and dates & time (DATE) tags.
  • Don’t forget to push all your models and datasets to the Hub so others can build on them!

Additional resources

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

  • Follow the instructions on the #join-course channel
  • Join the african-ner channel

Just make sure you comment here to indicate that you’ll be contributing to this project :slight_smile:

Team organization on the Hub

To join this team, make sure you join the following organisation on the Hub:

I am interested in this project, don’t know African languages, but would be delighted to create something useful for the community, Count me in!

1 Like

Great to hear that you’re interested in this project @seanbenhur! I’ve created a Discord channel (info in the topic description) in case you and others want to use it to coordinate :slight_smile:

Hey @seanbenhur, I’ve created an organisation on the Hub for your team so that you can push your models there and deploy your Streamlit / Gradio application :slight_smile:

See the topic description for the link (I’ve already send you an invite)

1 Like

Thank you!