Word Error Rate almost 100% after first epoch

csofie · April 7, 2025, 7:15pm

Hello! I’ve attempted to train an ASR system using Speechbrain. However, after the first epoch, my word error rate is insane - it is 99.63%. Literally almost 100%. Here’s the whole line from the training log:

epoch: 1, lr: 1.00e+00 - train loss: 5.15 - valid loss: 4.42, valid CER 81.95, valid WER. 99.63

I’m not sure if I’ve just completely screwed things up. Please let me know if I have!

Thank you!

John6666 · April 8, 2025, 6:25am

This seems to happen occasionally when training the ASR model, especially when training multilingual content.

github.com/speechbrain/speechbrain

ASR error rate growth while finetuning

opened 10:13AM - 29 Jan 25 UTC

hforghani

bug

### Describe the bug I have 3 Persian datasets of my own: - dataset 1: 485 h …- dataset 2 : 137 h - dataset 3: 25 h I finetuned Whisper large-v3 on dataset 1 & 2 until 241 epochs. I used the recipe CommonVoice/ASR/transformer/train_with_whisper.py with a few customizations for my data. The charts are below. Train loss, Validation loss, validation WER, and validation CER. (I don't know why validation loss goes up but never mind! it is not my case) Everything is OK! <img src="https://github.com/user-attachments/assets/9d7a8473-241c-4d38-8080-cd64af24397f" width="500" > Then I added dataset 3 and continued finetuning on 3 datasets. **Dataset 3 is more important for me.** and the charts went as below: <img src="https://github.com/user-attachments/assets/f73eece4-0565-421a-b3b6-58f83cde86fc" width="500" > We observe WER and CER have got larger. But I checked dataset 3 is correct and no significant noise or error is seen in the data. In another experiment I finetuned large-v3 just on dataset 3. The result was: <img src="https://github.com/user-attachments/assets/5a8fd13e-5fc7-42e4-a700-b604bc659895" width="500" > In another experiment in each 40-size batch I sampled: - 10 from dataset 1 - 10 from dataset 2 - 20 from dataset 3 The result was: <img src="https://github.com/user-attachments/assets/e33eb484-23c6-44ca-b31a-792b2d36f601" width="500" > This is my params for both 1-241 epochs and after 242: ```yaml freeze_whisper: False freeze_encoder: True number_of_epochs: 1000 weight_decay: 0.01 lr_whisper: 1e-5 warmup_steps: 500 max_grad_norm: 2.0 sorting: ascending precision: bf16 # bf16, fp16 or fp32 (fp16 on gpu) eval_precision: bf16 # (fp16 on gpu) sample_rate: 16000 ``` I changed lr_whisper and weight_decay as: ```yaml weight_decay: 0.05 lr_whisper: 1e-6 ``` but almost the same result obtained. What is wrong? ### Expected behaviour I expect WER and CER go down while continuing finetuning on 3 datasets. ### To Reproduce _No response_ ### Environment Details _No response_ ### Relevant Log Output ```shell ``` ### Additional Context _No response_

github.com/speechbrain/speechbrain

Common Voice Dataset Training for other languages not giving good results - Word Error Rate (WER) Not Reducing - Stuck at High Values - 100%

opened 07:53AM - 31 Jan 25 UTC

closed 10:01AM - 04 Mar 25 UTC

aslon1213

bug

### Describe the bug Hi everyone, I am training a speech recognition model us…ing SpeechBrain, but I noticed that while the loss is decreasing, my Word Error Rate (WER) remains high. Even after multiple epochs, it does not reduce below a certain threshold. Training setup almost the same as example provided in recipes/CommonVoice/transformers/conformer-large.yaml except some data path and language specific parameters. Training setup almost the same as example provided in recipes/CommonVoice/transformers/conformer-large.yaml: • Model: transformers • Dataset: CommonVoice corpus Observations: • Training and validation loss improve, but WER remains stuck at 100%. • Increasing epochs does not help significantly. • Model predictions contain a lot of substitutions/deletions/insertions. • Some outputs are completely blank or garbage text. Things I have tried: ✅ Adjusting learning rate ✅ Data augmentation (speed perturbation, noise addition) ✅ Checking label alignment in transcripts ✅ Changing beam search parameters in decoding ✅ Trying different architectures Has anyone else faced this issue? Are there specific hyperparameters or decoding techniques that helped reduce WER in SpeechBrain? I am using it for uzbek language of CommonVoice, which has about 400 hours of data. Would appreciate any insights! Thanks. ### Expected behaviour WER should lower as training ### To Reproduce torchrun --standalone --nproc_per_node=6 python train.py hprams/conformer-large.yaml ### Environment Details 6 GPUS with 12GB RAM ### Relevant Log Output ```shell Logs: epoch: 1, lr: 8.32e-07, steps: 27 - train loss: 3.42e+02 - valid loss: 3.67e+02, valid ACC: 9.25e-05, valid WER: 1.62e+03, valid CER: 1.03e+03 epoch: 2, lr: 1.70e-06, steps: 54 - train loss: 2.82e+02 - valid loss: 2.51e+02, valid ACC: 1.05e-04, valid WER: 1.51e+03, valid CER: 1.05e+03 epoch: 3, lr: 2.56e-06, steps: 81 - train loss: 1.85e+02 - valid loss: 1.36e+02, valid ACC: 7.07e-02, valid WER: 2.65e+02, valid CER: 2.56e+02 epoch: 4, lr: 3.42e-06, steps: 108 - train loss: 1.17e+02 - valid loss: 1.13e+02, valid ACC: 7.07e-02, valid WER: 100.00, valid CER: 99.68 epoch: 5, lr: 4.29e-06, steps: 135 - train loss: 1.09e+02 - valid loss: 1.10e+02, valid ACC: 7.67e-02, valid WER: 1.00e+02, valid CER: 99.28 epoch: 6, lr: 5.15e-06, steps: 162 - train loss: 1.10e+02 - valid loss: 1.08e+02, valid ACC: 8.07e-02, valid WER: 1.00e+02, valid CER: 99.28 epoch: 7, lr: 6.02e-06, steps: 189 - train loss: 1.06e+02 - valid loss: 1.06e+02, valid ACC: 8.32e-02, valid WER: 1.00e+02, valid CER: 99.28 epoch: 8, lr: 6.88e-06, steps: 216 - train loss: 98.01 - valid loss: 1.05e+02, valid ACC: 8.38e-02, valid WER: 1.00e+02, valid CER: 99.28 epoch: 9, lr: 7.74e-06, steps: 243 - train loss: 1.09e+02 - valid loss: 1.04e+02, valid ACC: 8.42e-02, valid WER: 1.00e+02, valid CER: 99.28 epoch: 10, lr: 8.61e-06, steps: 270 - train loss: 1.02e+02 - valid loss: 1.03e+02, valid ACC: 8.62e-02, valid WER: 99.99, valid CER: 98.42 epoch: 11, lr: 9.47e-06, steps: 297 - train loss: 1.05e+02 - valid loss: 1.02e+02, valid ACC: 8.71e-02, valid WER: 100.00, valid CER: 97.57 epoch: 12, lr: 1.03e-05, steps: 324 - train loss: 96.42 - valid loss: 1.02e+02, valid ACC: 8.47e-02, valid WER: 100.00, valid CER: 97.19 epoch: 13, lr: 1.12e-05, steps: 351 - train loss: 96.79 - valid loss: 1.01e+02, valid ACC: 8.70e-02, valid WER: 100.00, valid CER: 97.20 epoch: 14, lr: 1.21e-05, steps: 378 - train loss: 96.04 - valid loss: 99.98, valid ACC: 9.07e-02, valid WER: 1.00e+02, valid CER: 97.51 ``` ### Additional Context _No response_

Topic		Replies	Views
Whisper Model: Validation loss decreasing but WER increasing/constant Beginners	0	268	December 18, 2023
Need help training Speech2Text from scratch 🤗Transformers	0	881	November 26, 2021
Tiny whisper finetuning for french speech recognition Models	3	444	September 17, 2024
Effect of different sample rates while finetuning an XLSR ASR model Models	0	253	April 27, 2023
Common-voice + librispeech Beginners	0	396	June 29, 2022

Word Error Rate almost 100% after first epoch

Related topics