Whisper fine-tuning slow eval

f5QZU81BAbyAeqpnoBPa · February 28, 2024, 12:39pm

I have fine-tuned a Whisper-small model using this guide Fine-tuning the ASR model - Hugging Face Audio Course and have observed that the eval steps seem to be much slower (about a factor of 2-3) than the training steps. This seems very strange to me as I would have though that the eval steps just do a forward pass that would also happen in the training steps, so they should not be any slower. I am using the same batch size for both training and eval, not doing beam search, and the decoding and WER evaluation is not included in the eval step time. While it does not influence the end result I would still like to understand the training process better.

Topic		Replies	Views
Different inference speed for finetuned Whisper models Beginners	0	394	February 28, 2024
Tiny whisper finetuning for french speech recognition Models	3	444	September 17, 2024
Evaluating performance before and after fine-tuning Beginners	1	25	March 20, 2025
Whisper fine-tuning on Librispeech makes WER worse 🤗Transformers	6	2403	June 26, 2023
Whisper decoder is slow for ASR task 🤗Transformers	3	1920	November 26, 2023

Whisper fine-tuning slow eval

Related topics