Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction in French

Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction

For this project, one can use a randomly initialized or a pre-trained BART/T5 model.


Pre-trained BART, T5 models can be found on the model hub.


The dataset for this model can be prepared as described in this blog post.
One can make use OSCAR . The dataset is also available through the datasets library here: oscar · Datasets at Hugging Face.

Available training scripts

As this will be a Seq2Seq model, the script can be used for training.

(Optional) Desired project outcome

The desired outcome is to train a spelling correction model for the French language. This can be showcased directly on the hub or with a streamlit or gradio app.

(Optional) Challenges

Implementing the dataset noising function would be the challenging part of the project.

(Optional) Links to read upon

1 Like

Hi @valhalla I have previously worked on grammar correction for English using T5, and it gives a great result. It would an exciting task to do same on French, would like to be part of this project.


This one sounds interesting…

1 Like

Awesome! Let’s define this project then :slight_smile:

Added you the team @khalidsaifullaah and @Vaibhavbrkn . Let me know if you have nay comments either here or in the sheet.

1 Like

Thanks @valhalla, but since I have new commitment for some projects, I can no longer will be a part of this project.

1 Like

Noted, removed you from the team.

1 Like

interested in this project :heart_eyes: