I have fine-tuned a Whisper-small model using this guide Fine-tuning the ASR model - Hugging Face Audio Course and have observed that the eval steps seem to be much slower (about a factor of 2-3) than the training steps. This seems very strange to me as I would have though that the eval steps just do a forward pass that would also happen in the training steps, so they should not be any slower. I am using the same batch size for both training and eval, not doing beam search, and the decoding and WER evaluation is not included in the eval step time. While it does not influence the end result I would still like to understand the training process better.
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Different inference speed for finetuned Whisper models | 0 | 333 | February 28, 2024 | |
Tiny whisper finetuning for french speech recognition | 3 | 373 | September 17, 2024 | |
Whisper fine-tuning on Librispeech makes WER worse | 6 | 1939 | June 26, 2023 | |
Whisper decoder is slow for ASR task | 3 | 1688 | November 26, 2023 | |
Evaluation step very slow | 1 | 560 | February 21, 2024 |