ASR spell correction

Many ideas from automatic post-editing and automatic grammar correction can probably be used here as well. Those are some good keywords to get you started.

1 Like

Hi.

When searching for solutions about ASR errors corrections, I found this topic in the HF forum.

I would like to discuss with you about 2 models.

FastCorrect

Recently, Microsoft Asia published FastCorrect paper (and more recently, FastCorrect 2). I like the 2 main ideas on which is based this model:

  1. Training of an edit distance model in order to adapt the tokens number of the source (sentence with errors from the ASR output) to the one of the target (sentence without errors): thus, the decoder input will have the right number of tokens (the target one) and can focus on finding the right tokens (if necessary) corresponding to the decoder input tokens.
  2. Use of non-autoregressive (NAR) decoder in order to predict in parallel all the target tokens: this NAR can speed up by 9 the prediction of all target tokens in comparison to the use of an autogressive decoder. This is a proposed solution to use such an ASR errors correction model in real-time.

Interesting, no? What do you think of FastCorrect?

However, I did not find any released code.

paper : FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition (last revised 1 Oct 2021)

T5 (or ByT5)

flexudy published no model hub of HF Sentence doctor (github) that is a T5 model that attempts to correct the errors or mistakes found in sentences (model works on English, German and French text). The training script is provided (train_any_t5_task.py): it should looks like the HF translation scripts / HF translation notebook but flexudy explains it used Abhishek Kumar Mishra’s transformer tutorial on text summarization (see as well HF summarization notebook).

Interesting, no? What do you think of using T5 (or ByT5) for ASR errors correction?

Note: as T5 decoder is auto regressive, I guess the sentence doctor could not be used for ASR errors correction in real time. Any thoughts about this issue (real time)?

2 Likes

@pierreguillou thanks for pointing out our model. We also trained a model for wav2vec. flexudy/t5-small-wav2vec2-grammar-fixer

Fixing the grammar of sentences have been quite a problem for a while. I think it is time to solve this problem right. We will release a new model + dataset soon too.

@flexudy, any updates on the dataset? In my experience with English Silero/Coqui STT models, most ASR transcribed mistakes were related to phonetically similar words (does/thus, week/weak etc.). Aren’t out there some datasets/banks of phonetically similar words to create a targeted training dataset for these correction models?

@vblagoje yeah, we are currently building a script that will generate this dataset containing lists of homophones (e.g week/weak). Here are some of the points we are currently considering:

  • [ ] Punctuation
  • [ ] Casing
  • [ ] Similar spelling
  • [ ] Homophones
  • [ ] Determiners
  • [ ] Inflexion
  • [ ] Plurality
  • [ ] Pronoun
  • [ ] Deletions
  • [ ] Filler words e.g um, uh

The difficulty is that we want a multi-lingual data generator.

2 Likes

Hey guys Hi, I am researching different method to reduce the wer of a asr model. I have looked at shallow fusion. I am trying to join the slack channel but the invite link is not working can you resent it please or send me a invite at clivefernandes20@gmail.com

Any new progress now?