Please read the topic category description to understand what this is all about
Description
Many countries have populations that speak and write in more than one language. Building NLP applications in these conditions can be challenging, especially if the languages differ significantly from each other. The goal of this project is to explore the effectiveness of multilingual Transformer models by training a classifier that can analyze texts in multiple languages at once.
Model(s)
There are a few popular multilingual models that you can start with:
Datasets
There are several multilingual datasets on the Hub that you can use to get started:
Even better would be to create a multilingual dataset in your own languages!
Challenges
- The current multilingual models are typically limited to 100 languages or so. Check out the corresponding papers to see if your language is supported.
Desired project outcomes
- Create a Streamlit or Gradio app on Spaces that can automatically classify text in multiple languages.
- Donβt forget to push all your models and datasets to the Hub so others can build on them!
Additional resources
- This project has some overlap with the summarization section of Chapter 7 in the course.