Create a multilingual classifier

:wave: Please read the topic category description to understand what this is all about

Description

Many countries have populations that speak and write in more than one language. Building NLP applications in these conditions can be challenging, especially if the languages differ significantly from each other. The goal of this project is to explore the effectiveness of multilingual Transformer models by training a classifier that can analyze texts in multiple languages at once.

Model(s)

There are a few popular multilingual models that you can start with:

Datasets

There are several multilingual datasets on the Hub that you can use to get started:

Even better would be to create a multilingual dataset in your own languages!

Challenges

  • The current multilingual models are typically limited to 100 languages or so. Check out the corresponding papers to see if your language is supported.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that can automatically classify text in multiple languages.
  • Don’t forget to push all your models and datasets to the Hub so others can build on them!

Additional resources

  • This project has some overlap with the summarization section of Chapter 7 in the course.
2 Likes

Hi Lewtun,

is there any notebook with a solution of this task?

Kind regards

Hi @mox I’m not aware of a fully worked out solution to this project, but you could use this notebook as a starting point

1 Like