Create a spellchecking system

:wave: Please read the topic category description to understand what this is all about

Description

Nowadays you can find spellchecker everywhere - on your phone, Microsoft Word, and so on. The goal of this project is to train a Transformer model to automatically correct our spelling in a language of your choosing!

Model(s)

You can frame spellchecking as a sequence-to-sequence task, so a good starting point is to checkout the machine translation example in Chapter 7 of the :hugs: Course. Once you understand that, a T5 or mT5 model is a good start to train your models.

Datasets

The GitHub Typo corpus is a good place to start. An alternative is to use back-translation to create your own corpus of noisy labels, since most machine translation systems typically introduce small errors this way.

Challenges

This is a rather open-ended project, and one that might require some careful data preprocessing / augmentation. A good starting strategy would be to adapt the example given in the resources below, but using the :hugs: ecosystem instead of the fairseq library.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that [Fill description]
  • Don’t forget to push all your models and datasets to the Hub so others can build on them!

Additional resources

2 Likes