@patrickvonplaten Thank you for the great work on releasing many variants of wav2vec2 and tutorials. They are super helpful. I am new to the ASR domain and able to reproduce some of the results with the released models. I was trying to compare the WER to the paper. I noticed there is a gap between HF’s models and the paper scores. I couldn’t figure out where the gap was coming from. Putting them in the table below, could you shed some light? Thank you!
|Model||Pretraining Dataset||Fine Tuning Dataset||Eval. Dataset||WER (%)||Relative (%)|
|Wav2vec 2.0 - Table 1, 3rd row from bottom||LS-960||Labelled LS-100||clean/test||2.60||baseline|
|Wav2vec 2.0 - Table 1, 3rd row from bottom||LS-960||Labelled LS-100||other/test||6.3||baseline|
|Wav2vec 2.0 - Table 2, 3rd row from bottom||LS-960||Labelled LS-960||clean/test||2.1||baseline|
|Wav2vec 2.0 - Table 2, 3rd row from bottom||LS-960||Labelled LS-960||other/test||4.8||baseline|