Ideas to correct Wav2Vec2 transcription results

permutans · February 20, 2021, 11:57pm

I’m tinkering with the preprocessing (segmentation) to achieve the best Wav2Vec2 transcriptions I can, and am fairly impressed with the results (compared to others like Silero, and previous experience with Sphinx, or alternatives like ELAN).

(I’m finding I probably need to reduce the maximum segment times below 60s and haven’t quite got that down perfectly yet, but that’s beside the point)

However, there are some pretty glaring phonetic transcription mistakes and I’m wondering if there are standard approaches to adjust these (in an automated way, before resorting to manual adjustment).

For instance, I’m seeing “Boric Johnson” (for Boris Johnson, the UK Prime Minister), “ennay chess” (for NHS, the UK’s health service), and “social medeor” (social media).

These are all clearly phonetic ‘guesses’ and would be identifiable as outside the vocabulary of a standard text model: I’m curious if there’s a standard post-processing step that attempts to ‘realign’ such outputs with possible alternatives (which I could perhaps explore in an interface to resolve ambiguous parts).

I’m wondering if anyone knows of the usual approach (or if this is not a standard step, could suggest an innovative approach) using language models — even just the proper terms for what I’m trying to do here would help me research my next steps.

I’d think it was a similar-ish problem to spelling mistakes (for which there’s ‘T5 Sentence Doctor’ for example, but tests aren’t too encouraging). I may be missing a more appropriate alternative I don’t know about so I thought I’d ask the community here.

Thanks, first question on the forum so please let me know if this is off topic, I’ve used the new Wav2Vec2 960h model and may try the other versions next but expect this will apply to all of them.

omarsou · May 11, 2021, 10:14am

Hi,

I think you need a language model trained on a large corpora. You can have a look on this issue.

Best,
Omar

Topic		Replies	Views
ASR spell correction Research	29	8713	April 24, 2024
Correct Wav2Vec2 ASR output Beginners	0	129	December 21, 2023
Dealing with proper nouns in wav2vec2 Models	0	501	May 8, 2022
Wav2vec2 finetuning and language model Beginners	0	213	October 1, 2023
Model Suggestion on Text correction Beginners	0	766	April 2, 2021

Ideas to correct Wav2Vec2 transcription results

Related topics