Create a spellchecking system

lewtun · November 10, 2021, 4:50pm

Please read the topic category description to understand what this is all about

Description

Nowadays you can find spellchecker everywhere - on your phone, Microsoft Word, and so on. The goal of this project is to train a Transformer model to automatically correct our spelling in a language of your choosing!

Model(s)

You can frame spellchecking as a sequence-to-sequence task, so a good starting point is to checkout the machine translation example in Chapter 7 of the Course. Once you understand that, a T5 or mT5 model is a good start to train your models.

Datasets

The GitHub Typo corpus is a good place to start. An alternative is to use back-translation to create your own corpus of noisy labels, since most machine translation systems typically introduce small errors this way.

Challenges

This is a rather open-ended project, and one that might require some careful data preprocessing / augmentation. A good starting strategy would be to adapt the example given in the resources below, but using the ecosystem instead of the fairseq library.

Desired project outcomes

Create a Streamlit or Gradio app on Spaces that [Fill description]
Don’t forget to push all your models and datasets to the Hub so others can build on them!

Topic		Replies	Views
Create your own writing assistant 🤗 Course Projects	16	2780	November 18, 2021
Help for spelling corrector model Models	0	388	December 20, 2022
Grammar Check using pre-trained models 🤗Hub	0	280	March 14, 2023
Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction in English Flax/JAX Projects	7	7185	October 11, 2021
Best transformer model to check grammar 🤗Transformers	0	322	September 24, 2023