ASR spell correction

Hi.

When searching for solutions about ASR errors corrections, I found this topic in the HF forum.

I would like to discuss with you about 2 models.

FastCorrect

Recently, Microsoft Asia published FastCorrect paper (and more recently, FastCorrect 2). I like the 2 main ideas on which is based this model:

  1. Training of an edit distance model in order to adapt the tokens number of the source (sentence with errors from the ASR output) to the one of the target (sentence without errors): thus, the decoder input will have the right number of tokens (the target one) and can focus on finding the right tokens (if necessary) corresponding to the decoder input tokens.
  2. Use of non-autoregressive (NAR) decoder in order to predict in parallel all the target tokens: this NAR can speed up by 9 the prediction of all target tokens in comparison to the use of an autogressive decoder. This is a proposed solution to use such an ASR errors correction model in real-time.

Interesting, no? What do you think of FastCorrect?

However, I did not find any released code.

paper : FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition (last revised 1 Oct 2021)

T5 (or ByT5)

flexudy published no model hub of HF Sentence doctor (github) that is a T5 model that attempts to correct the errors or mistakes found in sentences (model works on English, German and French text). The training script is provided (train_any_t5_task.py): it should looks like the HF translation scripts / HF translation notebook but flexudy explains it used Abhishek Kumar Mishra’s transformer tutorial on text summarization (see as well HF summarization notebook).

Interesting, no? What do you think of using T5 (or ByT5) for ASR errors correction?

Note: as T5 decoder is auto regressive, I guess the sentence doctor could not be used for ASR errors correction in real time. Any thoughts about this issue (real time)?

2 Likes