Wav2vec2-base task performance

vuiseng9 · February 18, 2022, 7:53pm

@patrickvonplaten Thank you for the great work on releasing many variants of wav2vec2 and tutorials. They are super helpful. I am new to the ASR domain and able to reproduce some of the results with the released models. I was trying to compare the WER to the paper. I noticed there is a gap between HF’s models and the paper scores. I couldn’t figure out where the gap was coming from. Putting them in the table below, could you shed some light? Thank you!

Model	Pretraining Dataset	Fine Tuning Dataset	Eval. Dataset	WER (%)	Relative (%)
Wav2vec 2.0 - Table 1, 3rd row from bottom	LS-960	Labelled LS-100	clean/test	2.60	baseline
facebook/wav2vec2-base-100h	LS-960	Labelled LS-100	clean/test	6.10	235%
Wav2vec 2.0 - Table 1, 3rd row from bottom	LS-960	Labelled LS-100	other/test	6.3	baseline
facebook/wav2vec2-base-100h	LS-960	Labelled LS-100	other/test	13.5	214%

Wav2vec 2.0 - Table 2, 3rd row from bottom	LS-960	Labelled LS-960	clean/test	2.1	baseline
facebook/wav2vec2-base-960h	LS-960	Labelled LS-960	clean/test	3.4	162%
Wav2vec 2.0 - Table 2, 3rd row from bottom	LS-960	Labelled LS-960	other/test	4.8	baseline
facebook/wav2vec2-base-960h	LS-960	Labelled LS-960	other/test	8.6	179%

Kuray107 · February 21, 2022, 4:29pm

I think maybe that’s because in the paper they add a transformer LM to improve the performance. By the way, I am wondering if there is any discussion about how to combine a transformer LM with huggingface’s wav2vev2 model as well. I’ve found one blog article written by @patrickvonplaten showing how to boost wav2vec2 with n-gram LM, but currently I don’t know how to combine the model with a transformer LM.

patrickvonplaten · March 18, 2022, 4:29pm

It should actually be very easy to add a LM to Wav2Vec2, I have it done here: patrickvonplaten/wav2vec2-base-100h-with-lm · Hugging Face

All you need to do is to take an official ngram, e.g. this one: openslr.org

and then just follow the blog post here: Boosting Wav2Vec2 with n-grams in 🤗 Transformers

patrickvonplaten · March 18, 2022, 4:29pm

The results without LM should match more or less - I’ve tested this for a couple of checkpoints

vuiseng9 · May 8, 2023, 6:46pm

Thanks @patrickvonplaten , @Kuray107 for your comments. It appeared that I misunderstood and make incorrect correspondence. Updated table as follow. HF’s models are close to paper

Model	Pretraining Dataset	Fine Tuning Dataset	Eval. Dataset	WER (%)	Relative (%)
Wav2vec 2.0 - Table 9, 9th row from bottom	LS-960	Labelled LS-100	test/clean	6.1	baseline
facebook/wav2vec2-base-100h	LS-960	Labelled LS-100	test/clean	6.1	0%
Wav2vec 2.0 - Table 9, 9rd row from bottom	LS-960	Labelled LS-100	test/other	13.3	baseline
facebook/wav2vec2-base-100h	LS-960	Labelled LS-100	test/other	13.5	2%

Wav2vec 2.0 - Table 10, 9th row from bottom	LS-960	Labelled LS-960	test/clean	3.4	baseline
facebook/wav2vec2-base-960h	LS-960	Labelled LS-960	test/clean	3.4	0%
Wav2vec 2.0 - Table 10, 9th row from bottom	LS-960	Labelled LS-960	test/other	8.5	baseline
facebook/wav2vec2-base-960h	LS-960	Labelled LS-960	test/other	8.6	1%

Topic		Replies	Views
Different versions of 'wav2vec2' model and their differences Beginners	1	1518	August 7, 2021
Live Transcription/ASR Beginners	0	1644	September 18, 2022
Trouble in boosting ASR performance by adding LM Models	0	328	May 5, 2023
PreTrain Wav2Vec2 in German Flax/JAX Projects	7	1365	July 7, 2021
Swedish ASR: Fine Tuning Wav2Vec2 Models	4	865	March 23, 2021

Wav2vec2-base task performance

Related topics