Please read the topic category description to understand what this is all about
For many online applications it is not known in advance what language an end-user will communicate in. The goal of this project is to build a system that can automatically predict the language a text is written in.
There are a few popular multilingual models that you can start with:
There are quite a few multilingual datasets available on the Hub. Many of these have a “language” field that could be used as a target for the model to predict.
This project will likely require you to combine several datasets together to gain enough coverage of many languages.
- Create a Streamlit or Gradio app on Spaces that can predict the language of a piece of text provided by an end-user
- Don’t forget to push all your models and datasets to the Hub so others can build on them!
A good baseline to compare your model against is the Python
To chat and organise with other people interested in this project, head over to our Discord and:
Follow the instructions on the
Just make sure you comment here to indicate that you’ll be contributing to this project