@patrickvonplaten Thank you for the great work on releasing many variants of wav2vec2 and tutorials. They are super helpful. I am new to the ASR domain and able to reproduce some of the results with the released models. I was trying to compare the WER to the paper. I noticed there is a gap between HF’s models and the paper scores. I couldn’t figure out where the gap was coming from. Putting them in the table below, could you shed some light? Thank you!
Model | Pretraining Dataset | Fine Tuning Dataset | Eval. Dataset | WER (%) | Relative (%) |
---|---|---|---|---|---|
Wav2vec 2.0 - Table 1, 3rd row from bottom | LS-960 | Labelled LS-100 | clean/test | 2.60 | baseline |
facebook/wav2vec2-base-100h | LS-960 | Labelled LS-100 | clean/test | 6.10 | 235% |
Wav2vec 2.0 - Table 1, 3rd row from bottom | LS-960 | Labelled LS-100 | other/test | 6.3 | baseline |
facebook/wav2vec2-base-100h | LS-960 | Labelled LS-100 | other/test | 13.5 | 214% |
Wav2vec 2.0 - Table 2, 3rd row from bottom | LS-960 | Labelled LS-960 | clean/test | 2.1 | baseline |
facebook/wav2vec2-base-960h | LS-960 | Labelled LS-960 | clean/test | 3.4 | 162% |
Wav2vec 2.0 - Table 2, 3rd row from bottom | LS-960 | Labelled LS-960 | other/test | 4.8 | baseline |
facebook/wav2vec2-base-960h | LS-960 | Labelled LS-960 | other/test | 8.6 | 179% |