Build a language detector

:wave: Please read the topic category description to understand what this is all about


For many online applications it is not known in advance what language an end-user will communicate in. The goal of this project is to build a system that can automatically predict the language a text is written in.


There are a few popular multilingual models that you can start with:


There are quite a few multilingual datasets available on the Hub. Many of these have a “language” field that could be used as a target for the model to predict.


This project will likely require you to combine several datasets together to gain enough coverage of many languages.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that can predict the language of a piece of text provided by an end-user
  • Don’t forget to push all your models and datasets to the Hub so others can build on them!

Additional resources

A good baseline to compare your model against is the Python langid library

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

  • Follow the instructions on the #join-course channel

  • Join the #language-detection channel

Just make sure you comment here to indicate that you’ll be contributing to this project :slight_smile:

Hi, I would like to work on this project.

1 Like

Cool to hear @ivanlau ! I’ve created a Discord channel (see topic description) in case you and others want to use it :slight_smile:

1 Like

@lewtun Hi, I’m interested in working on this project.
Are we supposed to consider all the languages recognized by the langid library (which we’ll be using as baseline), or is it ok to consider fewer languages ?

It’s totally fine for you to choose the scope for the project :slight_smile: I agree that working with a few languages is a great way to start!

1 Like

Hi @lewtun ! I want to work on this project.

1 Like

Alright thanks !! I’ll do so :slight_smile:

1 Like

Hi, I’d be glad to join this project. Am I still in time?

I think you’re the 4th person, so yes there’s still time and space on the team!

1 Like

hi is this project still on ?

Hi @hfawaz, this community event ended last November. Having said that, you’re more than welcome to use the #course:course-event topics as inspiration to build NLP powered applications :slight_smile:

thanks, do you have any shared open ressources that you guys found following this course ?

You might find the official tutorials in the transformers library to be helpful: 🤗 Transformers Notebooks