Create a multilingual classifier

lewtun · November 9, 2021, 7:09pm

Please read the topic category description to understand what this is all about

Description

Many countries have populations that speak and write in more than one language. Building NLP applications in these conditions can be challenging, especially if the languages differ significantly from each other. The goal of this project is to explore the effectiveness of multilingual Transformer models by training a classifier that can analyze texts in multiple languages at once.

Model(s)

There are a few popular multilingual models that you can start with:

Datasets

There are several multilingual datasets on the Hub that you can use to get started:

amazon_reviews_multi

Even better would be to create a multilingual dataset in your own languages!

Challenges

The current multilingual models are typically limited to 100 languages or so. Check out the corresponding papers to see if your language is supported.

Desired project outcomes

Create a Streamlit or Gradio app on Spaces that can automatically classify text in multiple languages.
Don’t forget to push all your models and datasets to the Hub so others can build on them!

Additional resources

This project has some overlap with the summarization section of Chapter 7 in the course.

mox · December 21, 2021, 12:09pm

Hi Lewtun,

is there any notebook with a solution of this task?

Kind regards

lewtun · December 21, 2021, 1:17pm

Hi @mox I’m not aware of a fully worked out solution to this project, but you could use this notebook as a starting point

rinchen7 · October 22, 2024, 2:48pm