ASR spell correction

BramVanroy · March 26, 2021, 7:11pm

Many ideas from automatic post-editing and automatic grammar correction can probably be used here as well. Those are some good keywords to get you started.

pierreguillou · October 11, 2021, 4:17pm

Hi.

When searching for solutions about ASR errors corrections, I found this topic in the HF forum.

I would like to discuss with you about 2 models.

FastCorrect

Recently, Microsoft Asia published FastCorrect paper (and more recently, FastCorrect 2). I like the 2 main ideas on which is based this model:

Training of an edit distance model in order to adapt the tokens number of the source (sentence with errors from the ASR output) to the one of the target (sentence without errors): thus, the decoder input will have the right number of tokens (the target one) and can focus on finding the right tokens (if necessary) corresponding to the decoder input tokens.
Use of non-autoregressive (NAR) decoder in order to predict in parallel all the target tokens: this NAR can speed up by 9 the prediction of all target tokens in comparison to the use of an autogressive decoder. This is a proposed solution to use such an ASR errors correction model in real-time.

Interesting, no? What do you think of FastCorrect?

However, I did not find any released code.

paper : FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition (last revised 1 Oct 2021)

T5 (or ByT5)

flexudy published no model hub of HF Sentence doctor (github) that is a T5 model that attempts to correct the errors or mistakes found in sentences (model works on English, German and French text). The training script is provided (train_any_t5_task.py): it should looks like the HF translation scripts / HF translation notebook but flexudy explains it used Abhishek Kumar Mishra’s transformer tutorial on text summarization (see as well HF summarization notebook).

Interesting, no? What do you think of using T5 (or ByT5) for ASR errors correction?

Note: as T5 decoder is auto regressive, I guess the sentence doctor could not be used for ASR errors correction in real time. Any thoughts about this issue (real time)?

flexudy · October 12, 2021, 5:32pm

@pierreguillou thanks for pointing out our model. We also trained a model for wav2vec. flexudy/t5-small-wav2vec2-grammar-fixer

Fixing the grammar of sentences have been quite a problem for a while. I think it is time to solve this problem right. We will release a new model + dataset soon too.

vblagoje · October 17, 2021, 8:07am

@flexudy, any updates on the dataset? In my experience with English Silero/Coqui STT models, most ASR transcribed mistakes were related to phonetically similar words (does/thus, week/weak etc.). Aren’t out there some datasets/banks of phonetically similar words to create a targeted training dataset for these correction models?

flexudy · October 17, 2021, 3:04pm

@vblagoje yeah, we are currently building a script that will generate this dataset containing lists of homophones (e.g week/weak). Here are some of the points we are currently considering:

[ ] Punctuation
[ ] Casing
[ ] Similar spelling
[ ] Homophones
[ ] Determiners
[ ] Inflexion
[ ] Plurality
[ ] Pronoun
[ ] Deletions
[ ] Filler words e.g um, uh

The difficulty is that we want a multi-lingual data generator.

Clive · December 17, 2021, 1:21pm

Hey guys Hi, I am researching different method to reduce the wer of a asr model. I have looked at shallow fusion. I am trying to join the slack channel but the invite link is not working can you resent it please or send me a invite at clivefernandes20@gmail.com

LukeJacob2023 · August 21, 2023, 8:27am

Any new progress now?

Owos · April 24, 2024, 11:40am

isn’t this how Transducer models work? @joaoalvarenga

Owos · April 24, 2024, 11:40am

the invite no longer works. Could you send a new one ? @flozi00

Owos · April 24, 2024, 11:48am

@Clive did you get to join the Slack?

Topic		Replies	Views
Ideas to correct Wav2Vec2 transcription results Beginners	1	999	May 11, 2021
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	2997	January 4, 2022
Pre-training/fine-tuning Seq2Seq model for spelling and/or grammar correction in English Flax/JAX Projects	7	7150	October 11, 2021
Live Transcription/ASR Beginners	0	1621	September 18, 2022
I want to custom my data set in speech recognition wav2vec Beginners	1	828	August 9, 2021

ASR spell correction

FastCorrect

T5 (or ByT5)

Related topics